Extended Data Life Management (EDLM) Best Practices
Extended Data Life Management (EDLM) Best Practices
The EDLM feature allows customers to perform media scans to validate the integrity of the data written to the tape. These scans can be run on a proactive or reactive basis and scan policies can be defined to suite the customer’s needs. Defining the EDLM scan policies properly to minimize the performance impact of your normal backup jobs is a very important process. Defining these policies can be complex as every customer has different backup requirements or media use cases. The purpose of this Best Practice article is to help define when and how EDLM scans should be performed to maximize the feature along with minimizing the impact to your backup requirements.
EDLM Media Scan Levels
There are three different EDLM media scan levels that can be performed (Quick, Normal, and Full). It’s important to understand which level of scan should be leveraged based on the media targeted for scanning. Each scan level takes a different amount of time to perform and you need to weigh the scan time against the probability of data loss on any given tape. Listed below are the different scan levels. We advise our customers to primarily use the “Quick” scan level for the majority of their scan’s as the probability of the data being damaged on a tape still within the library is very low. If your media is moved offsite or stored in less than optimal environmental conditions, then it is suggested to perform a Normal or Full scan once the media is moved back into the library. The paramount data point to understand is evaluating the probability of losing data integrity. The higher the probability of data loss the higher level of scan’s you should perform. For example, it is not very probable that the data is going to be lost or damaged on media sitting inside the library which is within an environmentally control data center. Alternately, it is much more probable to have data loss or damage for a tape that is exported from the library and sent offsite to a remote location. It’s up to the system administrator to determine the proper EDLM scan level to leverage verses the probability of data loss on any given piece of media. Choosing the wrong scan level for a given tape will not damage the media, but it can result in a tremendous amount of wasted time doing excessively long scans when it’s not needed.
The first level is a “Quick” scan. The “Quick” scan leverages the read/write performance data on the media Cartridge Memory (CM) chip. The EDLM drive reviews the read/write performance data points and determines if the date on the tape is Good, Suspect, or Bad. This level of scan takes about 1 – 2 minutes to perform and is the most common type of scan because of its reduced scan time.
The second level is a “Normal” scan. The “Normal” scan leverages the CM data just like the “Quick” scan but also performs a reading process of the data on the media. The EDLM drive reviews the read/write performance data points and determines if the date on the tape is Good, Suspect, or Bad. This level of scan takes about 20 minutes to perform.
The third level is a “Full” scan. The “Full” scan leverages the CM data just like the “Quick” scan but also perform a full read of the media. The EDLM drive reviews the read/write performance data points and determines if the date on the tape is Good, Suspect, or Bad. This level of scan takes about 2 hours to perform.
EDLM Scan Results
There are three different scan results (Good, Suspect, Bad). Interpreting the scan results is required to know the functional state of the media. If you choose a low scan level and it returns a suspect or bad result then it is usually advised to run the next higher level of scan to confirm the Suspect or Bad test results. For example, if you run a Quick scan and the result is “Suspect” then it is suggested to run a Normal scan to confirm the result since the Quick scan is only assessing the read/write performance data on the CM chip within the media. You can leverage the EDLM scan test results details for more information on why the scan result was Suspect or Bad. A common misconception is that performing EDLM scan’s will “Fix” suspect or bad media. This is not true. The EDLM feature only assesses the current state of the media and provides the results of the scan. Listed below are the different scan results along with details about them.
The first is a “Good” test result which means that the EDLM scan process passed all of the tests performed based on the level of scan run. Media that has a “Good” test result is deemed to have full data integrity and should be able to be used for backups or restores as needed.
The second is a “Suspect” test result which means that the EDLM scan passed some but not all tests performed based on the level of scan run. Some scan test failures are not considered critical issues which do not lead to data loss. A good example of a condition that would create a Suspect test result is a missing End of Data (EOD) for a data set written to the tape. The missing EOD is not optimal but the tape drive can figure out the issue and successfully append data to the tape. It is suggested that you attempt to migrate the data off of tapes with Suspect results to ensure data integrity. The majority of the time you can turn the Suspect tape back into a scratch tape and reuse it without issue.
The third is a “Bad” test result which means that the EDLM scan failed some or all of its critical data integrity tests which means that there is a high potential for data loss. It is suggested that you attempt to migrate the data off of the tape if at all possible. Then decommission and discontinue using the Bad result tape.
EDLM association with Partitions
The EDLM feature can function with a dedicated EDLM partition as well as perform scans on media within other standard partitions. A dedicated EDLM partition does not have external SCSI host access or control. In other words, an Application Software will never be able to interact directly with the dedicated EDLM partition.
The primary purpose of a dedicated EDLM partition is to provide a “Working Space” for the EDLM feature to perform scanning functions without disturbing standard partitions. For example, you can import media into the EDLM partition and then perform manual scan processes to validate the media data integrity before moving the media over to one of the standard partitions so it can be used for backups. Since the dedicated EDLM partition does not have any external host access or control all movement of media in and out of this partition must be performed via the library GUI. You can also setup specific scan policies for the EDLM partition to suit your needs and minimize the manual library GUI interaction needed. For example, if you want to perform a scan on all tapes moved into the EDLM partition you can set a policy of “Scan on Import” and then define the scan level desired (Quick, Normal, Full). This policy will automatically perform an EDLM scan on any tape imported into the EDLM partition. Once the scan is complete then all you have to do is view the scan results to see if further action is required. Leveraging the dedicated EDLM partition is usually reserved for manually rescanning media after it has produced a suspect or bad scan result while residing in a standard partition.
Standard Partitions can also leverage the EDLM scanning functions from a Proactive or Reactive scan policy perspective. Proactive scan policies can be configured to perform specific levels of scans after a given number of days a piece of media has resided in the library or time since its last scan. You can also setup the same “Scan on Import” policy which will trigger a scan on all media imported into the library. The Scan on Import policy is not commonly used as it can be disruptive to backup job requirements. For example, when system administrators import media into the library there is a very likely chance that they need to leverage that media right away for a backup or restore. In this scenario it does not make sense to setup a proactive scan policy that immediately scans the media that you need to use right away. Especially if you set a scan level to Normal or Full which can take an extended amount of time to perform.
Reactive scan policies can also be setup to perform a scan after a certain number of Tape Alert errors are produced while the tape is being used in a normal data drive. You can tune this reactive scan policy to happen after a specific quantity of tape alert errors and also define the level of scan that should be performed. This is another area where you can set the scan policies too aggressively and it will impact your backup job requirements. It is recommended that you use a scan level of “Quick” and a quantity of Tape Alerts to “3” as that will minimize the impact to the backup jobs while performing the required scans for suspect media associated with Tape Alert events. As mentioned before, if the Quick Scan test results come back at Suspect or Bad, its recommended to manually perform the next level of scan to validate the test results. In this case it would be a “Normal” scan as the secondary level manual scan.
Balancing the EDLM Scan policies with your backup work flow
There are several different Proactive or Reactive EDLM scan polices that can be defined. The basic proactive scans are based on the time/date of when the media is imported into the library or when the last scan took place. More advanced scan policies are driven by Tape Alert events associated with media performance within drives or moving media in and out of the library.
As mentioned before, it is up to the System Administrator to assess their backup work flow and define the proper EDLM scan policies that meets their needs without impacting their work flow. It’s important to assess the probability of media integrity becoming circumvented then set the scan policies as needed to properly assess that media risk. More often than not administrators tend to set their EDLM scan policies too aggressively given the risk probability of their media. Doing so will negatively impact the work flow of your backup jobs as media needed for backups is being EDLM scanned when it needs to be used for backups or restores. The best approach is to leverage the “Quick” scan level for the majority of the Proactive and Reactive scan polices. If the result of the Quick scan is Suspect or Bad, then manually move the tape into the EDLM partition and perform a Normal or Full scan to confirm the failed scan state of the media.