Duplicate Volume Found During Hardware Expansion |
SR Information: 1576782
Product / Software Version: 2.2.1 - Issue found on DXi 45xx and possibly detectable in other platforms (excluding issue related to bug 33488)
Problem Description: Data Loss during rebuild due to puncture stripe
Related PTRs:
|
Overview
The DXi will fail to execute expansion if it finds labels under the new disks added.
This article shows how you can identify and handle the issue, as well as what data you need to collect before you escalate to your backline in case assistance is needed.
Note: The log information, command output, and data collection outlined below were were gathered from an existing SR. Obviously, you may see different device information or results. In case of doubts, we recommend that you escalate the case to your backline team.
This article covers the following topics:
Before and after any expansion, it's highly recommended that you collect the following information. which will help diagnose issues that may occur:
# /usr/cvfs/bin/cvlabel –L
# /opt/DXi/3ware/3waretool.sh --map
(applicable for DXi systems with a 3ware controller)
# find /sys -name block:sd\*
# hexdump -C device -s 00000200 -n 20000 > device.out
(where device is the device node of the disk /dev/sd## used for the snfs filesystem. This depends of the number of devices -- you may want to write a shell script to collect this information)
Example:
hexdump -C /dev/sdm -s 00000200 -n 20000 > sdm.out
(applicable for DXi systems with a 3ware controller)
Symptom (How to Identify the Problem)
The messages log will present the following event:
Note that the log information was collected from an existing SR. Your device information may be different.
(32484) Jun 24 15:15:02 ustchqtm1 systemupgrade: [CheckVolumes] : disable_foreign_volumes() - Disabling volume: /dev/sdm /c2/u0 0017522C9B7C2500F160 SSD
(32485) Jun 24 15:15:03 ustchqtm1 srvclogcli: E0000(1)<00008>:SRVCLOG RCOMP: 8 RINST: CheckVolumes VCOMP: 8 VINST: CheckVolumes VPINST: UNKNOWN EVENT: 41 TEXT: Found duplicate volume /c2/u0, disabling volume. Ticket creation time: 06/24 15:15:03 CDT
(32486) Jun 24 15:15:03 ustchqtm1 systemupgrade: [CheckVolumes] : disable_foreign_volumes() - Disabling volume: /dev/sdq /c2/u2 YGJA3Z1A9B7C3400A650 DATA
(32489) Jun 24 15:15:04 ustchqtm1 srvclogcli: E0000(1)<00008>:SRVCLOG RCOMP: 8 RINST: CheckVolumes VCOMP: 8 VINST: CheckVolumes VPINST: UNKNOWN EVENT: 41 TEXT: Found duplicate volume /c2/u2, disabling volume. Ticket creation time: 06/24 15:15:04 CDT
Here's an additional symptom that may or may not be found on a 3ware array, depending on the port to which the EM is connected:
For this case, please consult PTR 33225 for the workaround/fix.
Before the rescan: Output of "tw_cli /c2 show" reports Units U1 and U3 under Enclosure 0
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u1 RAID-1 OK - - - 186.254 RiW OFF
u3 RAID-6 OK - - 256K 14901.1 RiW OFF
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p10 OK u1 186.31 GB SATA - /c2/e0/slt0 INTEL SSDSA2BZ200G3
p11 OK u1 186.31 GB SATA - /c2/e0/slt1 INTEL SSDSA2BZ200G3
p12 OK u3 1.82 TB SATA - /c2/e0/slt2 Hitachi HUA723020AL
p13 OK u3 1.82 TB SATA - /c2/e0/slt3 Hitachi HUA723020AL
p14 OK u3 1.82 TB SATA - /c2/e0/slt4 Hitachi HUA723020AL
p15 OK u3 1.82 TB SATA - /c2/e0/slt5 Hitachi HUA723020AL
p16 OK u3 1.82 TB SATA - /c2/e0/slt6 Hitachi HUA723020AL
p17 OK u3 1.82 TB SATA - /c2/e0/slt7 Hitachi HUA723020AL
p18 OK u3 1.82 TB SATA - /c2/e0/slt8 Hitachi HUA723020AL
p19 OK u3 1.82 TB SATA - /c2/e0/slt9 Hitachi HUA723020AL
p20 OK u3 1.82 TB SATA - /c2/e0/slt10 Hitachi HUA723020AL
p21 OK u3 1.82 TB SATA - /c2/e0/slt11 Hitachi HUA723020AL
After the Rescan: Output of "tw_cli /c2 show" now reports Units U1 and U3 under Enclosure 1 and the new Units U0 and U2 under Enclosure 0
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 OK - - - 186.254 RiW OFF
u1 RAID-1 OK - - - 186.254 RiW OFF
u2 RAID-6 OK - - 256K 14901.1 RiW OFF
u3 RAID-6 OK - - 256K 14901.1 RiW OFF
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p8 OK u0 186.31 GB SATA - /c2/e0/slt0 STEC SSDSA M16ISD2-
p9 OK u0 186.31 GB SATA - /c2/e0/slt1 STEC SSDSA M16ISD2-
p10 OK u2 1.82 TB SATA - /c2/e0/slt2 Hitachi HUA723020AL
p11 OK u1 186.31 GB SATA - /c2/e1/slt0 INTEL SSDSA2BZ200G3
p12 OK u1 186.31 GB SATA - /c2/e1/slt1 INTEL SSDSA2BZ200G3
p13 OK u3 1.82 TB SATA - /c2/e1/slt2 Hitachi HUA723020AL
p14 OK u2 1.82 TB SATA - /c2/e0/slt3 Hitachi HUA723020AL
p15 OK u2 1.82 TB SATA - /c2/e0/slt4 Hitachi HUA723020AL
p16 OK u2 1.82 TB SATA - /c2/e0/slt5 Hitachi HUA723020AL
p17 OK u2 1.82 TB SATA - /c2/e0/slt6 Hitachi HUA723020AL
p18 OK u3 1.82 TB SATA - /c2/e1/slt3 Hitachi HUA723020AL
p19 OK u3 1.82 TB SATA - /c2/e1/slt4 Hitachi HUA723020AL
p20 OK u3 1.82 TB SATA - /c2/e1/slt5 Hitachi HUA723020AL
p21 OK u3 1.82 TB SATA - /c2/e1/slt6 Hitachi HUA723020AL
p22 OK u2 1.82 TB SATA - /c2/e0/slt7 Hitachi HUA723020AL
p23 OK u2 1.82 TB SATA - /c2/e0/slt8 Hitachi HUA723020AL
p24 OK u2 1.82 TB SATA - /c2/e0/slt9 Hitachi HUA723020AL
p25 OK u2 1.82 TB SATA - /c2/e0/slt10 Hitachi HUA723020AL
p26 OK u3 1.82 TB SATA - /c2/e1/slt7 Hitachi HUA723020AL
p27 OK u3 1.82 TB SATA - /c2/e1/slt8 Hitachi HUA723020AL
p28 OK u3 1.82 TB SATA - /c2/e1/slt9 Hitachi HUA723020AL
p29 OK u3 1.82 TB SATA - /c2/e1/slt10 Hitachi HUA723020AL
p30 OK u2 1.82 TB SATA - /c2/e0/slt11 Hitachi HUA723020AL
p31 OK u3 1.82 TB SATA - /c2/e1/slt11 Hitachi HUA723020AL
******
Resolution (Workaround or Fix)
There are two options to solve this issue.
Option 1:
If allowed, you may request another EM module to perform the expansion.
Option2:
Important Notes Before You Use This Option:
There is a command that allows you to remove the label of the LUN. But before removing the label, you need to make sure of the following things:
If you have doubts, please consult your senior engineer, or escalate to a backline engineer, providing the information that you collected as advised above.
How to check if the LUNS (from the EM) weren't added yet:
The binary to expand the snfs is cvupdatefs. The information about the expansion will be available in the file expandFS.log (the file is gathered by collect in the app-info directory).
In this case you can have two possible conditions:
4.1. The new LUNS of the new EM under mapfile.txt have labels, but the output of cvlabel -L shows no labels on those LUNS.
-- If this is the case, you can remove the new LUN entries, reboot the DXi, and try to expand again.
4.2. The new LUNS of the new EM under mapfile.txt have labels, and the output of cvlabel -L also shows labels on those LUNS.
-- If this is the case, proceed with the next step.
ATTENTION: Do not execute this command on the wrong LUN:
# cvlabel -U /dev/sd##
Where /dev/sd## is the device node of the new LUN from the new EM that is being pointed to in this case, which already had a label during the expansion process.
Additional Note:
On the DXi, the following file may be created during expansion:
/opt/DXi/FilesystemExpansionInProgress
This file may be created during the expansion process. You'll see that it is an empty file.
Supposing that the expansion process was interrupted, but nothing wrong occurred, and you need to continue the expansion, if you create this file (using the touch command), the DXi will try to resume the expansion on the next reboot.
In the procedure above, you may want to check if the file exists.
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |