This topic describes how to track down disk errors in the collect log. Important fields are shown in RED text.
- Go to /os-info/messages. This shows the disk errors reported. Grab the device name. (TS: I edited this step. Please verify.)
-bash-3.2$ grep sense messages
Jan 26 15:55:50 SM74D001 kernel: sdw: Current: sense key: Recovered Error
Jan 29 14:28:41 SM74D001 kernel: sdw: Current: sense key: Recovered Error
Jan 30 16:50:40 SM74D001 kernel: sdax: Current: sense key: Recovered Error
Jan 30 17:22:18 SM74D001 kernel: sdn: Current: sense key: Recovered Error
Feb 1 18:15:13 SM74D001 kernel: sdah: Current: sense key: Recovered Error
Feb 9 06:39:31 SM74D001 kernel: sdn: Current: sense key: Recovered Error
Feb 9 18:53:44 SM74D001 kernel: sdak: Current: sense key: Recovered Error
Feb 9 21:22:02 SM74D001 kernel: sdm: Current: sense key: Recovered Error
Feb 11 16:20:31 SM74D001 kernel: sdaq: Current: sense key: Recovered Error
- Check in /snfs-info/nssdgb.out. Go to the bottom of the file and search for last instance of sdaq.
sdaq:
[1024 12:29:08] 0x2b93ff0b7b00 NOTICE PortMapper: CVFS Volume Cvfs_200600A0B850D8B2_4_19 on device: /dev/sdaq (blk 0x42a0 raw 0x42a0) con: 14 lun: 4 state: 0x204 inquiry [LSI VirtualDisk 0760] controller # '200400A0B850D912' serial # '600A0B800050D8B200000A3E4A8C0121' Size: 11500910559 Sector Size: 512
- Search the /hw-info/array* files for d9:12 from the above controller number to find the array in question.
-bash-3.2$ grep d9:12 array*
array4a: World-wide node identifier: 20:04:00:a0:b8:50:d9:12
array4a: World-wide node identifier: 20:04:00:a0:b8:50:d9:12
array4a: MAC address: 00:a0:b8:50:d9:12
- From the nssdgb.out output it reports this as lun: 4 . Search the array4a configuration file in hw-info for the mappings to find the volume at lun 4.
LUN 4:
Volume Name LUN Controller Accessible by Volume status
Access Volume 7 A,B Storage Array Optimal
TRAY_0_VOL_1 3 B Storage Array Optimal
TRAY_0_VOL_2 4 B Storage Array Optimal
TRAY_0_VOL_3 5 B Storage Array Optimal
TRAY_1_VOL_1 6 A Storage Array Optimal
TRAY_1_VOL_2 8 A Storage Array Optimal
TRAY_1_VOL_3 9 A Storage Array Optimal
TRAY_2_VOL_1 10 B Storage Array Optimal
TRAY_2_VOL_2 11 B Storage Array Optimal
TRAY_2_VOL_3 12 B Storage Array Optimal
TRAY_85_VOL_1 0 A Storage Array Optimal
TRAY_85_VOL_2 1 A Storage Array Optimal
TRAY_85_VOL_3 2 A Storage Array Optimal
- The array4a configuration file in hw-info also shows the drives associated with Tray_0_VOL_2
Tray_0_Vol_2:
Associated volumes and free capacity
Volume Capacity
TRAY_0_VOL_1 102.000 GB
TRAY_0_VOL_2 5.356 TB
Associated drives - present (in piece order)
Tray Slot
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9
- Search the array-4.log file (majorEventLog.txt in the LSI collect) for errors during this time for the subset of drives in the volume.
--
Date/Time: 2/12/12 5:26:17 AM
Sequence number: 5769
Event type: 100A
Event category: Error
Priority: Informational
Description: Drive returned CHECK CONDITION
Event specific codes: b/88/3
Component type: Drive
Component location: Tray 0, Slot 9 <-----------
Logged by: Controller in slot B
--