TSM - Checksum Feature FAQ

1. When do we create the checksum of the file on tape?

Once the store occurs, if we do not get back a SCSI error or check condition, it is stored in the filecompX table.

We generate a checksum for each 'segment' of the file, so we are really generating the checksum twice - once for each copy (since they could each be written at different times (or even on different libraries, etc).

Each copy of the checksum is stored according to the file_key+segment+copy+version.

Example:

You can view the checksum for each copy, and each segment via "fsfileinfo -c <filename>"

But if it had 2 segments for 2 copies it would look like the following, with 2 entries for each copy (one entry each

for each copy and each segment of the most current version).

# fsfileinfo -c /stornext/snfs1/pc1/A_Big_File

-------------------------------------------------------------------------------

File Information Report Tue Jun 26 11:22:04 2012

Filename: /stornext/snfs1/pc1/A_Big_File

Stored path: <same>

-------------------------------------------------------------------------------

Last Modification: 02-may-2012 10:36:39

Owner: root Location: DISK AND TAPE

Group: root Existing Copies: 1

Access: 664 Target Copies: 1

Target Stub: 0 (KB) Existing Stub: n/a

File size: 1,066,025,676 Store: MINTIME

Affinity: n/a Reloc: MINTIME

Class: pc1 Trunc: MINTIME

Clean DB Info: YES

Media: 000026(1)

Checksum: 0f6cb2cbab362d5d02743091509fd9d2(1,1) ### copy(1) segment(1)

0fblah_blah_blah_blah0000000xxxx(1,2) ### copy(1) segment(2)

0f6cb2cbab362d5d02743091509fd9d2(2,1) ### copy(2) segment(1)

0fblah_blah_blah_blah0000000xxxx(2,2) ### copy(2) segment(2)

2. When do we compare the checksum of the copy written to tape with the checksum of the file on disk?

Once it is retrieved, it is validated as to the same checksum for that file_key+segment+copy+version.

3. How do we make sure the File has been correctly written to Tape?

Regrettably, it does not check on the actual write. If we do not get an indication back that the write was somehow unsuccessful, then we don’t know.

The sum is generated (per segment) and stored (per segment) as the write occurs, but not on what actually gets written to tape media. It is validated only on the retrieves.

4. If we would hit a corruption in between MDC -> SAN -> Drive Buffer Cache, when would we recognize the corruption?

On retrieval only. There was talk about “read after write” type scenarios, but due to the impact on performance that idea was shot down. The overhead of a verify after write is a huge performance penalty.

Note: The Drive is still doing a Read-after-Write from its Buffer Cache.

5. What’s the difference between our checksum feature and eg. manual md5sum ( beside that SN creates checksum for Segments )

It is just automated to some extent.

6. We recently added a bug (Bug44464) with a task of denying truncation on a file with two different checksums. That way the disk copy would always be there – but then who would know unless a retrieve occurred?

Note: Bug 44464 only works when > 1 copy is stored. It still does not guarantee that both copies are not corrupted but it does minimize the risk to a very large degree.

Other Enhancement Request which requires your feedback!

Bug 29707 - "fsretrieve -n" does not validate checksums

Bug 37955 - Alternate File Retrieval feature does not verify checksum

Bug 44464 - Prevent TSM from truncating files if the stored checksums do not match

Bug 50611 - Add option to validate checksums during medcopy

Bugs, Product Alerts & Bulletins

Bug 40169 - Checksum information for file is lost during fsmedcopy ( Product Alert 86 )

Bug 54321 - Stores to S3 media fail if checksums set in fs_sysparm