Duplicate Volume Found During Hardware Expansion

SR Information: 1576782

 

Product / Software Version: 2.2.1 - Issue found on DXi 45xx and possibly detectable in other platforms (excluding issue related to bug 33488) 

 

Problem DescriptionData Loss during rebuild due to puncture stripe

 

Related PTRs:

  • Bug 33225 - JBOD cleaning script can delete the wrong devices

 

Overview

 

The DXi will fail to execute expansion if it finds labels under the new disks added.

 

This article shows how you can identify and handle the issue, as well as what data you need to collect before you escalate to your backline in case assistance is needed.

 

Note: The log information, command output, and data collection outlined below were were gathered from an existing SR. Obviously, you may see different device information or results. In case of doubts, we recommend that you escalate the case to your backline team. 

 

This article covers the following topics:

 


 

Additional Information

 

Before and after any expansion, it's highly recommended that you collect the following information. which will help diagnose issues that may occur:

 

# /usr/cvfs/bin/cvlabel –L

 

# /opt/DXi/3ware/3waretool.sh --map

    (applicable for DXi systems with a 3ware controller)

 

# find /sys -name block:sd\*

 

# hexdump -C device -s 00000200 -n 20000 > device.out

  (where device is the device node of the disk /dev/sd## used for the snfs filesystem. This depends of the number of devices -- you may want to write a shell script to collect this information)

 

Example:

 

  hexdump -C /dev/sdm -s 00000200 -n 20000 > sdm.out

 


 

Symptom (How to Identify the Problem)

 

The messages log will present the following event:

 

Note that the log information was collected from an existing SR. Your device information may be different.

 

(32484) Jun 24 15:15:02 ustchqtm1 systemupgrade: [CheckVolumes] : disable_foreign_volumes() - Disabling volume: /dev/sdm /c2/u0 0017522C9B7C2500F160 SSD

 

(32485) Jun 24 15:15:03 ustchqtm1 srvclogcli: E0000(1)<00008>:SRVCLOG RCOMP: 8 RINST: CheckVolumes VCOMP: 8 VINST: CheckVolumes VPINST: UNKNOWN EVENT: 41 TEXT: Found duplicate volume /c2/u0, disabling volume.  Ticket creation time: 06/24 15:15:03 CDT

 

(32486) Jun 24 15:15:03 ustchqtm1 systemupgrade: [CheckVolumes] : disable_foreign_volumes() - Disabling volume: /dev/sdq /c2/u2 YGJA3Z1A9B7C3400A650 DATA

 

(32489) Jun 24 15:15:04 ustchqtm1 srvclogcli: E0000(1)<00008>:SRVCLOG RCOMP: 8 RINST: CheckVolumes VCOMP: 8 VINST: CheckVolumes VPINST: UNKNOWN EVENT: 41 TEXT: Found duplicate volume /c2/u2, disabling volume.  Ticket creation time: 06/24 15:15:04 CDT

 

Here's an additional symptom that may or may not be found on a 3ware array, depending on the port to which the EM is connected:

 

For this case, please consult PTR 33225 for the workaround/fix.

 

Before the rescan: Output of "tw_cli /c2 show" reports Units U1 and U3 under Enclosure 0

 

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy

------------------------------------------------------------------------------

u1    RAID-1    OK             -       -       -       186.254   RiW    OFF

u3    RAID-6    OK             -       -       256K    14901.1   RiW    OFF

VPort Status         Unit Size      Type  Phy Encl-Slot    Model

------------------------------------------------------------------------------

p10   OK             u1   186.31 GB SATA  -   /c2/e0/slt0  INTEL SSDSA2BZ200G3

p11   OK             u1   186.31 GB SATA  -   /c2/e0/slt1  INTEL SSDSA2BZ200G3

p12   OK             u3   1.82 TB   SATA  -   /c2/e0/slt2  Hitachi HUA723020AL

p13   OK             u3   1.82 TB   SATA  -   /c2/e0/slt3  Hitachi HUA723020AL

p14   OK             u3   1.82 TB   SATA  -   /c2/e0/slt4  Hitachi HUA723020AL

p15   OK             u3   1.82 TB   SATA  -   /c2/e0/slt5  Hitachi HUA723020AL

p16   OK             u3   1.82 TB   SATA  -   /c2/e0/slt6  Hitachi HUA723020AL

p17   OK             u3   1.82 TB   SATA  -   /c2/e0/slt7  Hitachi HUA723020AL

p18   OK             u3   1.82 TB   SATA  -   /c2/e0/slt8  Hitachi HUA723020AL

p19   OK             u3   1.82 TB   SATA  -   /c2/e0/slt9  Hitachi HUA723020AL

p20   OK             u3   1.82 TB   SATA  -   /c2/e0/slt10 Hitachi HUA723020AL

p21   OK             u3   1.82 TB   SATA  -   /c2/e0/slt11 Hitachi HUA723020AL

 

After the Rescan: Output of "tw_cli /c2 show" now reports Units U1 and U3 under Enclosure 1 and the new Units U0 and U2 under Enclosure 0

 

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy

------------------------------------------------------------------------------

u0    RAID-1    OK             -       -       -       186.254   RiW    OFF

u1    RAID-1    OK             -       -       -       186.254   RiW    OFF

u2    RAID-6    OK             -       -       256K    14901.1   RiW    OFF

u3    RAID-6    OK             -       -       256K    14901.1   RiW    OFF

VPort Status         Unit Size      Type  Phy Encl-Slot    Model

------------------------------------------------------------------------------

p8    OK             u0   186.31 GB SATA  -   /c2/e0/slt0  STEC SSDSA M16ISD2-

p9    OK             u0   186.31 GB SATA  -   /c2/e0/slt1  STEC SSDSA M16ISD2-

p10   OK             u2   1.82 TB   SATA  -   /c2/e0/slt2  Hitachi HUA723020AL

p11   OK             u1   186.31 GB SATA  -   /c2/e1/slt0  INTEL SSDSA2BZ200G3

p12   OK             u1   186.31 GB SATA  -   /c2/e1/slt1  INTEL SSDSA2BZ200G3

p13   OK             u3   1.82 TB   SATA  -   /c2/e1/slt2  Hitachi HUA723020AL

p14   OK             u2   1.82 TB   SATA  -   /c2/e0/slt3  Hitachi HUA723020AL

p15   OK             u2   1.82 TB   SATA  -   /c2/e0/slt4  Hitachi HUA723020AL

p16   OK             u2   1.82 TB   SATA  -   /c2/e0/slt5  Hitachi HUA723020AL

p17   OK             u2   1.82 TB   SATA  -   /c2/e0/slt6  Hitachi HUA723020AL

p18   OK             u3   1.82 TB   SATA  -   /c2/e1/slt3  Hitachi HUA723020AL

p19   OK             u3   1.82 TB   SATA  -   /c2/e1/slt4  Hitachi HUA723020AL

p20   OK             u3   1.82 TB   SATA  -   /c2/e1/slt5  Hitachi HUA723020AL

p21   OK             u3   1.82 TB   SATA  -   /c2/e1/slt6  Hitachi HUA723020AL

p22   OK             u2   1.82 TB   SATA  -   /c2/e0/slt7  Hitachi HUA723020AL

p23   OK             u2   1.82 TB   SATA  -   /c2/e0/slt8  Hitachi HUA723020AL

p24   OK             u2   1.82 TB   SATA  -   /c2/e0/slt9  Hitachi HUA723020AL

p25   OK             u2   1.82 TB   SATA  -   /c2/e0/slt10 Hitachi HUA723020AL

p26   OK             u3   1.82 TB   SATA  -   /c2/e1/slt7  Hitachi HUA723020AL

p27   OK             u3   1.82 TB   SATA  -   /c2/e1/slt8  Hitachi HUA723020AL

p28   OK             u3   1.82 TB   SATA  -   /c2/e1/slt9  Hitachi HUA723020AL

p29   OK             u3   1.82 TB   SATA  -   /c2/e1/slt10 Hitachi HUA723020AL

p30   OK             u2   1.82 TB   SATA  -   /c2/e0/slt11 Hitachi HUA723020AL

p31   OK             u3   1.82 TB   SATA  -   /c2/e1/slt11 Hitachi HUA723020AL

 

******

 

Resolution (Workaround or Fix)

 

There are two options to solve this issue.

 

Option 1:

If allowed, you may request another EM module to perform the expansion.

  

Option2:

 

Important Notes Before You Use This Option:

 

There is a command that allows you to remove the label of the LUN. But before removing the label, you need to make sure of the following things:

 

 If you have doubts, please consult your senior engineer, or escalate to a backline engineer, providing the information that you collected as advised above.

 

How to check if the LUNS (from the EM) weren't added yet:

 

The binary to expand the snfs is cvupdatefs. The information about the expansion will be available in the file expandFS.log (the file is gathered by collect in the app-info directory).

 

  1. Look at the exapandFS.log file and confirm that the expansion has not been done. If it was done, please escalate the case.
  1. Also check the vol0.cfg file (gathered by collect and available in the snfs-info directory), to confirm that the LUNS of the EM have not been added to the snfs.
  1. After you check that the snfs is clean (step 1 and 2) and there is no reference to the LUNS of the new EM, continue with the steps below.
  1. Check to see if the mapfile.txt has the new LUNS with labels

In this case you can have two possible conditions:

 

4.1. The new LUNS of the new EM  under mapfile.txt have labels, but the output of cvlabel -L shows no labels on those LUNS.

 

        -- If this is the case, you can remove the new LUN entries, reboot the DXi, and try to expand again.

 

4.2. The new LUNS of the new EM under mapfile.txt have labels, and the output of cvlabel -L also shows labels on those LUNS.

 

        -- If this is the case, proceed with the next step.

 

  1. Now let's remove the snfs label.

ATTENTION: Do not execute this command on the wrong LUN:

 

# cvlabel -U /dev/sd##

 
Where /dev/sd## is the device node of the new LUN from the new EM that is being pointed to in this case, which already had a label during the expansion process.

 

  1. At this point, you may reboot and try to do the expansion. If expansion doesn't work, redo the steps above, followed by cleaning the JBOD (new EM). The procedure to clean the JBOD is available in the Field Service Manual (but make sure you don't hit the issue reported under bug 33225).

Additional Note:

 

On the DXi, the following file may be created during expansion:

 

       /opt/DXi/FilesystemExpansionInProgress

 

This file may be created during the expansion process. You'll see that it is an empty file.

 

Supposing that the expansion process was interrupted, but nothing wrong occurred, and you need to continue the expansion, if you create this file (using the touch command), the DXi will try to resume the expansion on the next reboot.

 

In the procedure above, you may want to check if the file exists.



This page was generated by the BrainKeeper Enterprise Wiki, © 2018