Tracking Down Disk Errors in the Collect Log (DRAFT)

This topic describes how to track down disk errors in the collect log. Important fields are shown in RED text.

 

 
  1. Go to /os-info/messages. This shows the disk errors reported. Grab the device name. (TS: I edited this step. Please verify.)
-bash-3.2$ grep sense messages
Jan 26 15:55:50 SM74D001 kernel: sdw: Current: sense key: Recovered Error
Jan 29 14:28:41 SM74D001 kernel: sdw: Current: sense key: Recovered Error
Jan 30 16:50:40 SM74D001 kernel: sdax: Current: sense key: Recovered Error
Jan 30 17:22:18 SM74D001 kernel: sdn: Current: sense key: Recovered Error
Feb 1 18:15:13 SM74D001 kernel: sdah: Current: sense key: Recovered Error
Feb 9 06:39:31 SM74D001 kernel: sdn: Current: sense key: Recovered Error
Feb 9 18:53:44 SM74D001 kernel: sdak: Current: sense key: Recovered Error
Feb 9 21:22:02 SM74D001 kernel: sdm: Current: sense key: Recovered Error
Feb 11 16:20:31 SM74D001 kernel: sdaq: Current: sense key: Recovered Error 
 
  1. Check in /snfs-info/nssdgb.out. Go to the bottom of the file and search for last instance of sdaq.
sdaq:
 
[1024 12:29:08] 0x2b93ff0b7b00 NOTICE PortMapper: CVFS Volume Cvfs_200600A0B850D8B2_4_19 on device: /dev/sdaq (blk 0x42a0 raw 0x42a0) con: 14 lun: 4 state: 0x204 inquiry [LSI     VirtualDisk     0760] controller # '200400A0B850D912' serial # '600A0B800050D8B200000A3E4A8C0121' Size: 11500910559 Sector Size: 512
                                                               
 
  1. Search the /hw-info/array* files for d9:12 from the above controller number to find the array in question.
-bash-3.2$ grep d9:12 array*
array4a:         World-wide node identifier: 20:04:00:a0:b8:50:d9:12
array4a:         World-wide node identifier: 20:04:00:a0:b8:50:d9:12
array4a:         MAC address:                00:a0:b8:50:d9:12
 
  1. From the nssdgb.out output it reports this as lun: 4 . Search the array4a configuration file in hw-info for the mappings to find the volume at lun 4.
LUN 4:
            Volume Name    LUN Controller Accessible by Volume status
            Access Volume       7    A,B         Storage Array Optimal
            TRAY_0_VOL_1   3    B             Storage Array Optimal
            TRAY_0_VOL_2   4    B             Storage Array Optimal 
            TRAY_0_VOL_3   5    B             Storage Array Optimal
            TRAY_1_VOL_1   6    A             Storage Array Optimal
            TRAY_1_VOL_2   8    A             Storage Array Optimal
            TRAY_1_VOL_3   9    A             Storage Array Optimal
            TRAY_2_VOL_1   10   B           Storage Array Optimal
            TRAY_2_VOL_2   11   B           Storage Array Optimal
            TRAY_2_VOL_3   12   B           Storage Array Optimal
            TRAY_85_VOL_1 0    A           Storage Array Optimal
            TRAY_85_VOL_2 1    A           Storage Array Optimal
            TRAY_85_VOL_3 2    A           Storage Array Optimal
 
 
  1. The array4a configuration file in hw-info also shows the drives associated with Tray_0_VOL_2
Tray_0_Vol_2:
      Associated volumes and free capacity
 
 
         Volume              Capacity
         TRAY_0_VOL_1     102.000 GB
         TRAY_0_VOL_2     5.356 TB
 
 
      Associated drives - present (in piece order)
 
 
         Tray     Slot
            0        2
            0        3
            0        4
            0        5
            0        6
            0        7
            0        8
            0        9
 
  1. Search the array-4.log file (majorEventLog.txt in the LSI collect) for errors during this time for the subset of drives in the volume.
--
Date/Time: 2/12/12 5:26:17 AM
Sequence number: 5769
Event type: 100A
Event category: Error
Priority: Informational
Description: Drive returned CHECK CONDITION
Event specific codes: b/88/3
Component type: Drive
Component location: Tray 0, Slot 9    <-----------
Logged by: Controller in slot B
--
 

 



This page was generated by the BrainKeeper Enterprise Wiki, © 2018