VTL: How to Identify a Cartridge with a Bad cmd File

 

 

 

SR Information: 1552008

Product / Software Version: DXi 67xx - 2.1.3

Problem Description: Cartridges with bad metadata can't be handled (recycle, restore, backup, etc)

Related PTRs:

  • Bug 33993 - Mounting a cart with bad CMD files will exhaust Mount Wait List entries
  • Bug 33488 - Escalation: SR1541770 - Recover operation failed due to bad cart CMD files which were corrupted on the source system

 

 

 IMPORTANT NOTE:

Although this procedure provides you guidance to solve a corruption under a VTC metadata file, we advise you to investigate the root cause of the corruption and if needed escalate the case to your backline support and ask them if any further data is required to be pursued before apply this guideline. 

 

OVERVIEW

 

This article is based on a customer case that presented VTLs with bad metadata (in this case, a bad cmd file).

The article describes symptoms caused by the issue, as well as a resolution.

 

 

SYMPTOM (HOW TO IDENTIFY THE PROBLEM

 

Here is a list of situations that you or your customer may encounter on the DXi if the problem described in the Overview occurs.

  

 

ERROR  - 07/18/13-21:43:04 - MetaDataLayout MetaDataLayout.cpp(96) [vcm] Read() - Pad Not Valid, byte = 0

WARN   - 07/18/13-21:43:04 - MirroredMetaDataLayout MirroredMetaDataLayout.cpp(94) [vcm] Read() - Directory Restore error.  Trying backup copy...

ERROR  - 07/18/13-21:43:04 - MetaDataLayout MetaDataLayout.cpp(96) [vcm] Read() - Pad Not Valid, byte = 1

ERROR  - 07/18/13-21:43:04 - MirroredMetaDataLayout MirroredMetaDataLayout.cpp(119) [vcm] Read() - Directory Restore of backup failed! POSSIBLE LOSS OF DATA ON CARTRIDGE

ERROR  - 07/18/13-21:43:04 - Cartridge_HS Cartridge_HS.cpp(1282) [vcm] RecoverConfig() - Failed to Read Cartridge Meta Data

ERROR  - 07/18/13-21:43:04 - VCM VCartMgr.cpp(515) [vcm] VCM_CreateVolumeHandler() - Cart recovery failed.

  

 

 bash-3.2$ syscli --export media --barcode V01354 --name DXi6700VTL

 

ERROR: One of the media is not in the VTL. (E2004502)

 

    Invalid barcode: V01354

Total: 1 barcodes that do not exist or belong to VTL!

 

    Invalid barcode: V01354

Total: 1 barcodes that do not exist or belong to VTL!

 

Tsunami.log will display the following errors:

 

ERROR  - 06/14/13-11:03:33 - GJVTLMediaActions GJVTLMediaActions.cpp(389) [webguid] runnableFunction() - Error recycling barcode V01354 in partition "*UNASSIGNED".  Error: Error erasing virtual volume 

ERROR  - 06/07/13-16:39:02 - Vmm.BaseCmd VTDMountMediumCmd.cc(152) [vmm] doExecute() - VTD <VD07CX1051BVE00837> failed the mount with error <0x80000000> for barcode <V01354>

 

 

INFO   - 06/07/13-12:43:47 - Vmm.ThreadMethods ThreadMethods.cc(1121) [vmm] expireMedium() - Expire of barcode <V02125> requested by reqId 1115885957

INFO   - 06/07/13-12:43:47 - Vmm.MediumWorker MediumWorker.cc(80) [vmm] runCmd() - Locking medium V02125

INFO   - 06/07/13-12:43:47 - VCM VCartMgr.cpp(731) [vcm] VCM_EraseVolumeHandler() - Request to erase Barcode V02125 received.

ERROR  - 06/07/13-12:43:47 - Vmm.State VirtualVaultState.cc(148) [vmm] ev() - Failed to erase cart data for medium V02125

INFO   - 06/07/13-12:43:47 - Vmm.MediumWorker MediumWorker.cc(87) [vmm] runCmd() - Unlocking medium V02125

ERROR  - 06/07/13-12:43:47 - GJVTLMediaActions GJVTLMediaActions.cpp(389) [webguid] runnableFunction() - Error recycling barcode V02125 in partition "*UNASSIGNED".  Error: Error erasing virtual volume

 

 

 

  

 

Here an example of the error message you will find in under the messages log. This message is for a Dxi that has 80 mount licenses:

 

Apr  6 03:01:08 DXI6701-NYC vmm: E0000(1)<00000>:SRVCLOG RCOMP: 9 RINST: UNKNOWN VCOMP: 9 VINST: VD036CX1051BVE00837 VPINST: UNKNOWN EVENT: 65 TEXT: Total mounted virtual drives exceeds threshold limit of 80

 

NOTE: The following PTR reports the mount count issue: Bug 33993 - Mounting a cart with bad CMD files will exhaust Mount Wait List entries.

 

 

 

[root@DXI6701-NYC ~]# /opt/DXi/syscli --export media --barcode V01354 --name DXi6700VTL

 

Command completed successfully.

 

[root@DXI6701-NYC ~]# /opt/DXi/syscli --del  media --barcode V01354 --name *UNASSIGNED

 

Given barcode V01354 in vtl *UNASSIGNED does not qualify to be deleted

 

ERROR: Media has to be exported to delete it. (E2004101)

 

How to Identify If the Cartridge Has a Bad cmd File

 

1. Identify the labels of the cartridges that reported mount failures in tsunami.log:

 

WARN   - 05/18/13-14:01:10 - VTDMessenger vtdMessenger.cpp(333) [vtdVD043CX1051BVE00837] VTD_MountHandler() - Cartridge with barcode: V01598 RecoverConfig failed

ERROR  - 05/18/13-14:01:10 - Vmm.BaseCmd MsgUtil.cc(229) [vmm] sendAndReceive() - NS_VTD_VD043CX1051BVE00837 Failed the MountMedium Request. error <0x80000000 - Generic Failure>

ERROR  - 05/18/13-14:01:10 - Vmm.BaseCmd VTDMountMediumCmd.cc(152) [vmm] doExecute() - VTD <VD043CX1051BVE00837> failed the mount with error <0x80000000> for barcode <V01507>

WARN   - 05/18/13-14:17:04 - VTDMessenger vtdMessenger.cpp(333) [vtdVD05CX1051BVE00837] VTD_MountHandler() - Cartridge with barcode: V01507 RecoverConfig failed

ERROR  - 05/18/13-14:17:04 - Vmm.BaseCmd MsgUtil.cc(229) [vmm] sendAndReceive() - NS_VTD_VD05CX1051BVE00837 Failed the MountMedium Request. error <0x80000000 - Generic Failure>

ERROR  - 05/18/13-14:17:04 - Vmm.BaseCmd VTDMountMediumCmd.cc(152) [vmm] doExecute() - VTD <VD05CX1051BVE00837> failed the mount with error <0x80000000> for barcode <V01507>

WARN   - 05/19/13-15:08:44 - VTDMessenger vtdMessenger.cpp(333) [vtdVD2DCX1051BVE00837] VTD_MountHandler() - Cartridge with barcode: V01692 RecoverConfig failed

 

2. Follow the procedure below to identify a cartridge with a bad cmd file (for this procedure, we will use cartridge label V1598):

 

 

[root@DXI6701-NYC Carts]# /opt/DXi/bpw/bin/bwcat ./V01598_CX1051BVE00837_620939/cmd1 > /snfs/tmp/SR1552008/20130617/cmd1/V01598_CX1051BVE00837_620939

 

 

[root@DXI6701-NYC Carts]# /opt/DXi/cmd_decoder -d /snfs/tmp/SR1552008/20130617/cmd1/V01598_CX1051BVE00837_620939

 

    Type   count    RT Blocks  size     RT Bytes

 

    End of CMD

 

    Total Blocks            Total Count

    0                       0

 

Here an example of what to expect from a healthy/good cmd1:

 

[root@DXI6701-NYC cmd1]# cat decoder-result-cmd1-V02570

    Type   count    RT Blocks  size     RT Bytes

    Block  1        1          1024     1024

    FM     1

    Block  1        3          1024     2048

    Block  7813     7816       262144   2048133120

    FM     1

    Block  1        7818       1024     2048134144

    Block  7813     15631      262144   4096265216

    FM     1

    Block  1        15633      1024     4096266240

    Block  7813     23446      262144   6144397312

    FM     1

    Block  1        23448      1024     6144398336

    Block  7813     31261      262144   8192529408

    FM     1

    Block  1        31263      1024     8192530432

    Block  7813     39076      262144   10240661504

    FM     1

    Block  1        39078      1024     10240662528

    Block  7813     46891      262144   12288793600

    FM     1

    . . .

    Block  1        1507345    1024     395030715392

    Block  7813     1515158    262144   397078846464

    FM     1

    Block  1        1515160    1024     397078847488

    Block  7813     1522973    262144   399126978560

    FM     1

    Block  1        1522975    1024     399126979584

    Block  2132     1525107    262144   399685870592

    FM     1

 

    End of CMD

 

    Total Blocks            Total Count

    1525108                 399685870592

 

[root@DXI6701-NYC cmd1]#

 

RESOLUTION

 

NOTE: This resolution won't bring the data on the cartridge back;  please advise the customer about that. This process will expire/delete the data from the cartridge and will delete the cartridge. If the customer expects to recover the data, please confirm if the customer had replicated or duplicated the cartridge to a remote site. Also, keep in mind that if the partition is replicated via namespace, the customer must recover the whole partition in order to recover the cartridge in question.

 

1. Expire the images of the cartridge and delete the media in the backup server. For this case we are using the NBU commands. Please advise the customer to request assistance from the backup application vendor/manufacturer for those steps:

 

bpexpdate -m <TapeNumber> -d 0 -host <MediaServer> -force

vmdelete -m <media-id>

 

2. After you expire/delete the cartridge, execute Garbage Collection. Cartridges should be under the UNASSIGNED pool.

 

Note: Even after you have followed this procedure, you may have problems recycling the cartridge (see info under the Symptoms heading, above).

 

3. If the customer is facing issues where the bad cartridgess are causing the mount count to reach the limit allowed by the license without using all mounts, reset the mount counts with the following procedure:

 

rm /var/DXi/processwatcher

/etc/init.d/VMM_Control stop

/etc/init.d/VMM_Control start

touch /var/DXi/processwatcher

 

4. To permanently delete the bad cartridge, follow the procedure:

 

4.1. Check the cartridge:

 

[root@DXI6701-NYC ~]# /opt/DXi/vmmdbcorrect -b V02125

 Please see output in a file </usr/tmp/media_status.txt>

 can not get a library ref

 No changes have been made to the system

 

[root@DXI6701-NYC ~]# cat /usr/tmp/media_status.txt

 Output of utililty vmmDbCorrectionUtil

 barcode        vmc       state    to do (recommendations)

  V02125      UNASSIGNED   128  virt.tape with no data on disk; set state to 0;

 No changes have been made to the system

 

4.2. Apply the correction to the cartridge:

 

[root@DXI6701-NYC ~]# /opt/DXi/vmmdbcorrect -b V02125 -d

 Please see output in a file </usr/tmp/media_status.txt>

 can not get a library ref

 All changes have been applied successfully

 

4.3. Check the cartridge again (note the change marked in yellow):

 

[root@DXI6701-NYC ~]# cat /usr/tmp/media_status.txt

 Output of utililty vmmDbCorrectionUtil

 barcode        vmc       state         changes

  V02125      UNASSIGNED   128  virt.tape with no data on disk; set state to 0; Done

 All changes have been applied successfully

 

4.4. Query the cartridge to see the status:

 

[root@DXI6701-NYC ~]# /opt/DXi/vmmdbmedia -q -b V02125

Successful database connection

query for V02125

VmmMedium:

          barcode: <V02125>

              vmc: <UNASSIGNED>

        med group: <0>

         med type: <3 = LTO-L3>

         capacity: <400000000000>

    raw data size: <0>

   comp data size: <0>

       last block: <0>

     path to tape: <partitions/UnassignedDedup/Carts/V02125_CX1051BVE00837_831253>

            state: <0000000000000 = 0>

        imp state: <0>

     entry method: <0>

   shelveOnExport: <0>

      virtualOnly: <1>

  preserve labels: <0>

      data format: <1>

  virt wr protect: <0>

  phys wr protect: <0>

      last access: <Sun Feb 24 11:29:15 2013 (1361723355)>

 last phys access: <Undefined (0)>

        operation: <0>

    op start time: <Undefined (0)>

  op cluster node: <0>

    op src device: <NONE>

    op dst device: <NONE>

 phys mount reqId: <0>

    current cmd: <NONE>

    change made: <0>

 

4.5. Delete the cartridge:

 

[root@DXI6701-NYC ~]# /opt/DXi/syscli --del  media --barcode V02125 --name *UNASSIGNED

 

Command completed successfully.

 

4.6. You can verify in the GUI that the cartridge is no longer under the UNASSIGNED pool: 

 

 

Important Note: For this case, recovery of data is only possible if the customer has replicated the cartridge to a remote site.

 

 

 

 

Notes

A big smoking gun in the log are these messages (which specifically show a problem with the cmd files(s)):

 

ERROR  - 07/18/13-21:43:04 - MetaDataLayout MetaDataLayout.cpp(96) [vcm] Read() - Pad Not Valid, byte = 0
WARN   - 07/18/13-21:43:04 - MirroredMetaDataLayout MirroredMetaDataLayout.cpp(94) [vcm] Read() - Directory Restore error.  Trying backup copy...
ERROR  - 07/18/13-21:43:04 - MetaDataLayout MetaDataLayout.cpp(96) [vcm] Read() - Pad Not Valid, byte = 1
ERROR  - 07/18/13-21:43:04 - MirroredMetaDataLayout MirroredMetaDataLayout.cpp(119) [vcm] Read() - Directory Restore of backup failed! POSSIBLE LOSS OF DATA ON CARTRIDGE
ERROR  - 07/18/13-21:43:04 - Cartridge_HS Cartridge_HS.cpp(1282) [vcm] RecoverConfig() - Failed to Read Cartridge Meta Data
ERROR  - 07/18/13-21:43:04 - VCM VCartMgr.cpp(515) [vcm] VCM_CreateVolumeHandler() - Cart recovery failed.
 

This example is from a Replication Recovery, but a mount would have most of the same messages (from vtd instead of vcm).

Note by Mike Donohoe on 07/19/2013 11:24 AM


This page was generated by the BrainKeeper Enterprise Wiki, © 2018