CBT backup fails due to changeid mismatch (DRAFT) |
SR Information: 1611512
Product / Software Version: VmPro 3.1.1 - vCenter 5.0
Problem Description: Virtual machine backup will stuck in 'queued' state
Related PTRs:
Bug 5927 - Customer Quantum IT: Backups fail due to changeid mismatch for VMs with same-named disks on different datastores
Overview
This article provides clarification and description of workaround for a specific case found in the field where vmPRO will fail backups when the changeid of the virtual disk is different from the current changeid.
When you enable cbt, vmPRO will request the changeid of each virtual disk of the virtual machine that is being processed for the backup. For version control, vmPRO will save this changeid followed by a sequence number. For example:
52 44 cf 9b 8d 0e 90 30-5e 0b ee 35 76 30 f3 de/1911
Where:
52 44 cf 9b 8d 0e 90 30-5e 0b ee 35 76 30 f3 de is the change id
1911 is the sequence/version number
Whenever vmPRO is backing up a vm using cbt, vmPRO will have the change id and the sequence number from the prior backup and compare with the change id and sequence number collected during the current backup.
This case describes a specific condition where the changeid doesn't match, and describes the symptoms and the workaround.
Symptom
2013-10-09 19:58:25.595918: ERROR [thread:140005896836864, where:spawn.c:246, error:8] /usr/local/pancetera/bin/cbt_getblocklist.pl --server ppoesx23.quantum.com --portnumber 443 --username root --password ****** --vmuuid 4 31500 20aa639-4bbd-889c-b122-e9836130794f --vmname PPOWXENAPPV5 --vmdkpath /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5.vmdk --vmdksize 9663676416 --changeid_only n --resultsfile /tmp/queryHAJZaD -- 31500 vendor_id Quantum --appliance_uuid 2a2bea9ee24c11e187f90050568a6300 --debug 1 --changeid_incr 52 44 cf 9b 8d 0e 90 30-5e 0b ee 35 76 30 f3 de/1911
reg set appliance.cbt.debug 1
Note: to change the setting back to the default, execute:
reg set appliance.cbt.debug 0
When you have debug set, you can reproduce the problem and you'll find the following message in the vm_proxy.log (vmPRO appliance log):
27997 get_changeid checking kPV5/PPOWXENAPPV5.vmdk
27998 vmdk_size: 9663676416
27999 results_file: /tmp/queryHAJZaD
28000 get_changeid ds_path: /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/
28001 get_changeid disk_path: PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28002 get_changeid ds_url: /vmfs/volumes/4b902464-cf573416-7e5f-001f29cefe88/
28003 get_changeid ds_url: /vmfs/volumes/502530b3-8d820f93-02bc-a4badb125366/
28004 get_changeid ds_url: /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/
28005 get_changeid found ds_url match: /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/
28006 get_changeid ds_name: DS_PPO_T1_119_R1
28007 get_changeid layout_disk: [DS_PPO_T1_119_R1] PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28008 get_changeid diskFile: [DS_PPO_T1_110_R1] PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28009 get_changeid diskFile: [DS_PPO_T1_110_R1] PPOWXENAPPV5/PPOWXENAPPV5-000001.vmdk
28010 get_changeid diskFile: [DS_PPO_T1_119_R1] PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28011 get_changeid found diskFile match: [DS_PPO_T1_119_R1] PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28012 get_changeid disk_key: 2001
28013 get_changeid diskFile: [DS_PPO_T1_114] PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28014 get_changeid diskFile: [DS_PPO_T1_114] PPOWXENAPPV5/PPOWXENAPPV5-000001.vmdk
28015 get_changeid checking key: 200
28016 get_changeid checking key: 201
28017 get_changeid checking key: 300
28018 get_changeid checking key: 100
28019 get_changeid checking key: 400
28020 get_changeid checking key: 600
28021 get_changeid checking key: 700
28022 get_changeid checking key: 500
28023 get_changeid checking key: 12000
28024 get_changeid checking key: 1000
28025 get_changeid checking key: 3000
28026 get_changeid checking key: 4000
28027 get_changeid checking key: 2000
28028 get_changeid checking key: 2001
28029 get_changeid found key match: [DS_PPO_T1_119_R1] PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28030 found device key: 2001
28031 found changeId: 52 26 e2 c1 19 1c 52 de-01 48 54 15 a2 e5 99 61/2201
28032 received device_key: 2001
28033 received changeid_cur: 52 26 e2 c1 19 1c 52 de-01 48 54 15 a2 e5 99 61/2201
28034 modified changeid_base_1: 52 26 e2 c1 19 1c 52 de-01 48 54 15 a2 e5 99 61/1
28035 compare_changeids cur 0: 52 26 e2 c1 19 1c 52 de-01 48 54 15 a2 e5 99 61
28036 compare_changeids cur 1: 2201
28037 compare_changeids incr 0: 52 44 cf 9b 8d 0e 90 30-5e 0b ee 35 76 30 f3 de
28038 compare_changeids incr 1: 1911
28039 Changeid_incr does not match changeid_cur for /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28040 getblocklist changeid_incr: 52 44 cf 9b 8d 0e 90 30-5e 0b ee 35 76 30 f3 de/1911
28041 getblocklist changeid_cur: 52 26 e2 c1 19 1c 52 de-01 48 54 15 a2 e5 99 61/2201
28042 2013-10-09 19:58:25.904121: open_pancbt: requested auto Changed Block Tracking reset for: PPOWXENAPPV5
28043 ey: 12000
28044 get_changeid checking key: 1000
28045 get_changeid checking key: 3000
28046 get_changeid checking key: 4000
28047 get_changeid checking key: 2000
Note that in the above example, we have the same virtual disk (in green) and changeid_inc/seq# (in blue and pink) in the message logs event described under item1.
Where the does the changeid_cur (in red) came from? For that you'll need to check the virtual machine under the ESX host. Please see the next steps for directions.
Additional notes:
The ESX log will have the format <esx_host_name>-YYYY-MM-DD@HH-MM-SS.tgz as we see in the example bellow:
ppoesx23.quantum.com-vmsupport-2013-10-10@13-21-55.tgz
# tar -xzvf ppoesx23.quantum.com-vmsupport-2013-10-10\@13-21-55.tgz
# ll
drwxr-xr-x 12 eeskenbe domain^users 4096 Oct 15 10:41 esx-ppoesx23.quantum.com-2013-10-10--19.20
-rw-r--r-- 1 eeskenbe domain^users 159608097 Oct 10 16:20 ppoesx23.quantum.com-vmsupport-2013-10-10@13-21-55.tgz
# cd esx-ppoesx23.quantum.com-2013-10-10--19.20
# find ./vmfs/volumes -name PPOWXENAPPV5
./vmfs/volumes/502530b3-8d820f93-02bc-a4badb125366/PPOWXENAPPV5
./vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5
./vmfs/volumes/4b902464-cf573416-7e5f-001f29cefe88/PPOWXENAPPV5
Above we see that the virtual machine has 3 directories in different datastores. This means that this virtual machine has virtual disks spread in different datastores. You can find out where by running a simple find:
# find ./vmfs/volumes -name PPOWXENAPPV5* | grep vmdk
./vmfs/volumes/502530b3-8d820f93-02bc-a4badb125366/PPOWXENAPPV5/PPOWXENAPPV5-ctk.vmdk
./vmfs/volumes/502530b3-8d820f93-02bc-a4badb125366/PPOWXENAPPV5/PPOWXENAPPV5.vmdk
./vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5-ctk.vmdk
./vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5.vmdk
./vmfs/volumes/4b902464-cf573416-7e5f-001f29cefe88/PPOWXENAPPV5/PPOWXENAPPV5-ctk.vmdk
./vmfs/volumes/4b902464-cf573416-7e5f-001f29cefe88/PPOWXENAPPV5/PPOWXENAPPV5.vmdk
The datastores are:
502530b3-8d820f93-02bc-a4badb125366
4d654ab0-08d768f8-95a0-a4badb1252fe
4b902464-cf573416-7e5f-001f29cefe88
Additional notes:
A virtual disk is composed by 2 files:
*.vmdk and *-flat.vmdk
If you have cbt enabled, each virtual disk will have an additional file named *-ctk.vmdk
In the example above, because we are looking at the ESX log bundle, we will not have the *-flat.vmdk file included. This is ok; we won't need this file for this analysis.
# hexdump -C -n 64 ./vmfs/volumes/502530b3-8d820f93-02bc-a4badb125366/PPOWXENAPPV5/PPOWXENAPPV5-ctk.vmdk
00000000 a2 72 19 f6 01 00 00 00 00 00 00 00 00 02 00 00 |.r..............|
00000010 00 00 40 06 00 00 00 00 af 07 00 00 ab 07 00 00 |..@.............|
00000020 00 00 00 00 80 00 00 00 00 80 0c 00 52 44 cf 9b |............RD..|
00000030 8d 0e 90 30 5e 0b ee 35 76 30 f3 de 00 00 00 00 |...0^..5v0......|
00000040
# hexdump -C -n 64 ./vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5-ctk.vmdk
00000000 a2 72 19 f6 01 00 00 00 00 00 00 00 00 02 00 00 |.r..............|
00000010 00 00 20 01 00 00 00 00 a8 08 00 00 99 08 00 00 |.. .............|
00000020 00 00 00 00 80 00 00 00 00 40 02 00 52 26 e2 c1 |.........@..R&..|
00000030 19 1c 52 de 01 48 54 15 a2 e5 99 61 00 00 00 00 |..R..HT....a....|
00000040
# hexdump -C -n 64 ./vmfs/volumes/4b902464-cf573416-7e5f-001f29cefe88/PPOWXENAPPV5/PPOWXENAPPV5-ctk.vmdk
00000000 a2 72 19 f6 01 00 00 00 00 00 00 00 00 02 00 00 |.r..............|
00000010 00 00 40 01 00 00 00 00 78 07 00 00 72 07 00 00 |..@.....x...r...|
00000020 00 00 00 00 80 00 00 00 00 80 02 00 52 1c 4e 25 |............R.N%|
00000030 0f 31 14 73 52 80 e4 d8 3d f1 e4 ee 00 00 00 00 |.1.sR...=.......|
Where is the changeid located?
Starting from offset '2d' we have the changeid of the virtual disk in the ctk file.
Note again under the vm_proxy log (item 2, marked with the same color code) that vmPRO is comparing the changeid of two different virtual disks:
28039 Changeid_incr does not match changeid_cur for /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28040 getblocklist changeid_incr: 52 44 cf 9b 8d 0e 90 30-5e 0b ee 35 76 30 f3 de/1911
28041 getblocklist changeid_cur: 52 26 e2 c1 19 1c 52 de-01 48 54 15 a2 e5 99 61/2201
What you should expect in a normal condition is something like:
28039 Changeid_incr does not match changeid_cur for /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5.vmdk
28040 getblocklist changeid_incr: 52 26 e2 c1 19 1c 52 de-01 48 54 15 a2 e5 99 61/2200
28041 getblocklist changeid_cur: 52 26 e2 c1 19 1c 52 de-01 48 54 15 a2 e5 99 61/2201
Resolution
Monitor PTR 5927 for the fix status. In the interim, the workaround is to rename the virtual disks owned by the virtual machines with unique names.
The procedure to rename a virtual disk is complex. We have basic guidelines to get you familiar with the process, but we advise you to explain to the customer about the process and highly recommend that the customer pursue VMware technical support to assist him on this procedure.
Steps to rename the virtual disk (requires downtime of the virtual machine):
# vmkfstools -E /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5.vmdk /vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5_1.vmdk
The command above should change all the 3 files. Information about this command can be found in the VMware knowledgebase article:
http://kb.vmware.com/kb/1002491
# find ./ -name PPOWXENAPPV5.vmx
./vmfs/volumes/4b902464-cf573416-7e5f-001f29cefe88/PPOWXENAPPV5/PPOWXENAPPV5.vmx
Inside the vmx, you'll find the following lines:
scsi0:1.present = "true"
scsi0:1.fileName = "/vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5.vmdk"
scsi0:1.deviceType = "scsi-hardDisk"
scsi0:1.ctkEnabled = "true"
sched.scsi0:1.shares = "normal"
Change to the new renamed disk:
scsi0:1.present = "true"
scsi0:1.fileName = "/vmfs/volumes/4d654ab0-08d768f8-95a0-a4badb1252fe/PPOWXENAPPV5/PPOWXENAPPV5_1.vmdk"
scsi0:1.deviceType = "scsi-hardDisk"
scsi0:1.ctkEnabled = "true"
sched.scsi0:1.shares = "normal"
Clarification and instructions about the commands are found in the following vmware article:
http://kb.vmware.com/kb/2012964
Here the steps:
# vim-cmd vmsvc/getallvms
Vmid Name File Guest OS Version Annotation
13 DXi0_2.2.0_build47644 [Managed] BLUE_DXi0_2.2.0_build47644/BLUE_DXi0_2.2.0_build47644.vmx rhel6_64Guest vmx-07 IP: 10.20.234.82
15 RHEL-v6.2 [Unmanaged] RHEL-/RHEL-.vmx rhel6_64Guest vmx-07
17 Centos 6.3 [Unmanaged] Centos 6.3/Centos 6.3.vmx centos64Guest vmx-08
18 Marsden_Windows7 [Unmanaged] Marsden_Windows7/Marsden_Windows7.vmx windows7_64Guest vmx-08
# vim-cmd vmsvc/unregister <vimid_of_the_virtual_machine>
# vim-cmd solo/registervm /vmfs/volumes/4b902464-cf573416-7e5f-001f29cefe88/PPOWXENAPPV5/PPOWXENAPPV5.vmx
Important Note: this procedure will re-register the vm, but it will not add it back to the original resource pool, assuming that the machine was previously registered under one. Thus, we advise that the customer seek assistance from VMware. The re-registration may require extra steps to set the machine back to the original resource pool and other settings after re-registration.
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |