VTL: How to Identify a Cartridge with a Bad cmd File |
SR Information: 1552008 Product / Software Version: DXi 67xx - 2.1.3 Problem Description: Cartridges with bad metadata can't be handled (recycle, restore, backup, etc) Related PTRs: |
IMPORTANT NOTE:
Although this procedure provides you guidance to solve a corruption under a VTC metadata file, we advise you to investigate the root cause of the corruption and if needed escalate the case to your backline support and ask them if any further data is required to be pursued before apply this guideline.
OVERVIEW
This article is based on a customer case that presented VTLs with bad metadata (in this case, a bad cmd file).
The article describes symptoms caused by the issue, as well as a resolution.
SYMPTOM (HOW TO IDENTIFY THE PROBLEM
Here is a list of situations that you or your customer may encounter on the DXi if the problem described in the Overview occurs.
ERROR - 07/18/13-21:43:04 - MetaDataLayout MetaDataLayout.cpp(96) [vcm] Read() - Pad Not Valid, byte = 0
WARN - 07/18/13-21:43:04 - MirroredMetaDataLayout MirroredMetaDataLayout.cpp(94) [vcm] Read() - Directory Restore error. Trying backup copy...
ERROR - 07/18/13-21:43:04 - MetaDataLayout MetaDataLayout.cpp(96) [vcm] Read() - Pad Not Valid, byte = 1
ERROR - 07/18/13-21:43:04 - MirroredMetaDataLayout MirroredMetaDataLayout.cpp(119) [vcm] Read() - Directory Restore of backup failed! POSSIBLE LOSS OF DATA ON CARTRIDGE
ERROR - 07/18/13-21:43:04 - Cartridge_HS Cartridge_HS.cpp(1282) [vcm] RecoverConfig() - Failed to Read Cartridge Meta Data
ERROR - 07/18/13-21:43:04 - VCM VCartMgr.cpp(515) [vcm] VCM_CreateVolumeHandler() - Cart recovery failed.
bash-3.2$ syscli --export media --barcode V01354 --name DXi6700VTL
ERROR: One of the media is not in the VTL. (E2004502)
Invalid barcode: V01354
Total: 1 barcodes that do not exist or belong to VTL!
Invalid barcode: V01354
Total: 1 barcodes that do not exist or belong to VTL!
Tsunami.log will display the following errors:
ERROR - 06/14/13-11:03:33 - GJVTLMediaActions GJVTLMediaActions.cpp(389) [webguid] runnableFunction() - Error recycling barcode V01354 in partition "*UNASSIGNED". Error: Error erasing virtual volume
ERROR - 06/07/13-16:39:02 - Vmm.BaseCmd VTDMountMediumCmd.cc(152) [vmm] doExecute() - VTD <VD07CX1051BVE00837> failed the mount with error <0x80000000> for barcode <V01354>
INFO - 06/07/13-12:43:47 - Vmm.ThreadMethods ThreadMethods.cc(1121) [vmm] expireMedium() - Expire of barcode <V02125> requested by reqId 1115885957
INFO - 06/07/13-12:43:47 - Vmm.MediumWorker MediumWorker.cc(80) [vmm] runCmd() - Locking medium V02125
INFO - 06/07/13-12:43:47 - VCM VCartMgr.cpp(731) [vcm] VCM_EraseVolumeHandler() - Request to erase Barcode V02125 received.
ERROR - 06/07/13-12:43:47 - Vmm.State VirtualVaultState.cc(148) [vmm] ev() - Failed to erase cart data for medium V02125
INFO - 06/07/13-12:43:47 - Vmm.MediumWorker MediumWorker.cc(87) [vmm] runCmd() - Unlocking medium V02125
ERROR - 06/07/13-12:43:47 - GJVTLMediaActions GJVTLMediaActions.cpp(389) [webguid] runnableFunction() - Error recycling barcode V02125 in partition "*UNASSIGNED". Error: Error erasing virtual volume
Here an example of the error message you will find in under the messages log. This message is for a Dxi that has 80 mount licenses:
Apr 6 03:01:08 DXI6701-NYC vmm: E0000(1)<00000>:SRVCLOG RCOMP: 9 RINST: UNKNOWN VCOMP: 9 VINST: VD036CX1051BVE00837 VPINST: UNKNOWN EVENT: 65 TEXT: Total mounted virtual drives exceeds threshold limit of 80
NOTE: The following PTR reports the mount count issue: Bug 33993 - Mounting a cart with bad CMD files will exhaust Mount Wait List entries.
[root@DXI6701-NYC ~]# /opt/DXi/syscli --export media --barcode V01354 --name DXi6700VTL
Command completed successfully.
[root@DXI6701-NYC ~]# /opt/DXi/syscli --del media --barcode V01354 --name *UNASSIGNED
Given barcode V01354 in vtl *UNASSIGNED does not qualify to be deleted
ERROR: Media has to be exported to delete it. (E2004101)
How to Identify If the Cartridge Has a Bad cmd File
1. Identify the labels of the cartridges that reported mount failures in tsunami.log:
WARN - 05/18/13-14:01:10 - VTDMessenger vtdMessenger.cpp(333) [vtdVD043CX1051BVE00837] VTD_MountHandler() - Cartridge with barcode: V01598 RecoverConfig failed
ERROR - 05/18/13-14:01:10 - Vmm.BaseCmd MsgUtil.cc(229) [vmm] sendAndReceive() - NS_VTD_VD043CX1051BVE00837 Failed the MountMedium Request. error <0x80000000 - Generic Failure>
ERROR - 05/18/13-14:01:10 - Vmm.BaseCmd VTDMountMediumCmd.cc(152) [vmm] doExecute() - VTD <VD043CX1051BVE00837> failed the mount with error <0x80000000> for barcode <V01507>
WARN - 05/18/13-14:17:04 - VTDMessenger vtdMessenger.cpp(333) [vtdVD05CX1051BVE00837] VTD_MountHandler() - Cartridge with barcode: V01507 RecoverConfig failed
ERROR - 05/18/13-14:17:04 - Vmm.BaseCmd MsgUtil.cc(229) [vmm] sendAndReceive() - NS_VTD_VD05CX1051BVE00837 Failed the MountMedium Request. error <0x80000000 - Generic Failure>
ERROR - 05/18/13-14:17:04 - Vmm.BaseCmd VTDMountMediumCmd.cc(152) [vmm] doExecute() - VTD <VD05CX1051BVE00837> failed the mount with error <0x80000000> for barcode <V01507>
WARN - 05/19/13-15:08:44 - VTDMessenger vtdMessenger.cpp(333) [vtdVD2DCX1051BVE00837] VTD_MountHandler() - Cartridge with barcode: V01692 RecoverConfig failed
2. Follow the procedure below to identify a cartridge with a bad cmd file (for this procedure, we will use cartridge label V1598):
[root@DXI6701-NYC Carts]# /opt/DXi/bpw/bin/bwcat ./V01598_CX1051BVE00837_620939/cmd1 > /snfs/tmp/SR1552008/20130617/cmd1/V01598_CX1051BVE00837_620939
[root@DXI6701-NYC Carts]# /opt/DXi/cmd_decoder -d /snfs/tmp/SR1552008/20130617/cmd1/V01598_CX1051BVE00837_620939
Type count RT Blocks size RT Bytes
End of CMD
Total Blocks Total Count
0 0
Here an example of what to expect from a healthy/good cmd1:
[root@DXI6701-NYC cmd1]# cat decoder-result-cmd1-V02570
Type count RT Blocks size RT Bytes
Block 1 1 1024 1024
FM 1
Block 1 3 1024 2048
Block 7813 7816 262144 2048133120
FM 1
Block 1 7818 1024 2048134144
Block 7813 15631 262144 4096265216
FM 1
Block 1 15633 1024 4096266240
Block 7813 23446 262144 6144397312
FM 1
Block 1 23448 1024 6144398336
Block 7813 31261 262144 8192529408
FM 1
Block 1 31263 1024 8192530432
Block 7813 39076 262144 10240661504
FM 1
Block 1 39078 1024 10240662528
Block 7813 46891 262144 12288793600
FM 1
. . .
Block 1 1507345 1024 395030715392
Block 7813 1515158 262144 397078846464
FM 1
Block 1 1515160 1024 397078847488
Block 7813 1522973 262144 399126978560
FM 1
Block 1 1522975 1024 399126979584
Block 2132 1525107 262144 399685870592
FM 1
End of CMD
Total Blocks Total Count
1525108 399685870592
[root@DXI6701-NYC cmd1]#
RESOLUTION
NOTE: This resolution won't bring the data on the cartridge back; please advise the customer about that. This process will expire/delete the data from the cartridge and will delete the cartridge. If the customer expects to recover the data, please confirm if the customer had replicated or duplicated the cartridge to a remote site. Also, keep in mind that if the partition is replicated via namespace, the customer must recover the whole partition in order to recover the cartridge in question.
1. Expire the images of the cartridge and delete the media in the backup server. For this case we are using the NBU commands. Please advise the customer to request assistance from the backup application vendor/manufacturer for those steps:
bpexpdate -m <TapeNumber> -d 0 -host <MediaServer> -force
vmdelete -m <media-id>
2. After you expire/delete the cartridge, execute Garbage Collection. Cartridges should be under the UNASSIGNED pool.
Note: Even after you have followed this procedure, you may have problems recycling the cartridge (see info under the Symptoms heading, above).
3. If the customer is facing issues where the bad cartridgess are causing the mount count to reach the limit allowed by the license without using all mounts, reset the mount counts with the following procedure:
rm /var/DXi/processwatcher
/etc/init.d/VMM_Control stop
/etc/init.d/VMM_Control start
touch /var/DXi/processwatcher
4. To permanently delete the bad cartridge, follow the procedure:
4.1. Check the cartridge:
[root@DXI6701-NYC ~]# /opt/DXi/vmmdbcorrect -b V02125
Please see output in a file </usr/tmp/media_status.txt>
can not get a library ref
No changes have been made to the system
[root@DXI6701-NYC ~]# cat /usr/tmp/media_status.txt
Output of utililty vmmDbCorrectionUtil
barcode vmc state to do (recommendations)
V02125 UNASSIGNED 128 virt.tape with no data on disk; set state to 0;
No changes have been made to the system
4.2. Apply the correction to the cartridge:
[root@DXI6701-NYC ~]# /opt/DXi/vmmdbcorrect -b V02125 -d
Please see output in a file </usr/tmp/media_status.txt>
can not get a library ref
All changes have been applied successfully
4.3. Check the cartridge again (note the change marked in yellow):
[root@DXI6701-NYC ~]# cat /usr/tmp/media_status.txt
Output of utililty vmmDbCorrectionUtil
barcode vmc state changes
V02125 UNASSIGNED 128 virt.tape with no data on disk; set state to 0; Done
All changes have been applied successfully
4.4. Query the cartridge to see the status:
[root@DXI6701-NYC ~]# /opt/DXi/vmmdbmedia -q -b V02125
Successful database connection
query for V02125
VmmMedium:
barcode: <V02125>
vmc: <UNASSIGNED>
med group: <0>
med type: <3 = LTO-L3>
capacity: <400000000000>
raw data size: <0>
comp data size: <0>
last block: <0>
path to tape: <partitions/UnassignedDedup/Carts/V02125_CX1051BVE00837_831253>
state: <0000000000000 = 0>
imp state: <0>
entry method: <0>
shelveOnExport: <0>
virtualOnly: <1>
preserve labels: <0>
data format: <1>
virt wr protect: <0>
phys wr protect: <0>
last access: <Sun Feb 24 11:29:15 2013 (1361723355)>
last phys access: <Undefined (0)>
operation: <0>
op start time: <Undefined (0)>
op cluster node: <0>
op src device: <NONE>
op dst device: <NONE>
phys mount reqId: <0>
current cmd: <NONE>
change made: <0>
4.5. Delete the cartridge:
[root@DXI6701-NYC ~]# /opt/DXi/syscli --del media --barcode V02125 --name *UNASSIGNED
Command completed successfully.
4.6. You can verify in the GUI that the cartridge is no longer under the UNASSIGNED pool:
Important Note: For this case, recovery of data is only possible if the customer has replicated the cartridge to a remote site.
Notes |
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |