Hardware: Data loss on DXi due to punctures on the stripe |
SR Information: 1562354 Product / Software Version: 2.2.1 - Issue found on DXi 45xx and possibly detectable in other platforms (excluding issue related to bug 33488) Problem Description: Data Loss during rebuild due to puncture stripe Related PTRs: |
OVERVIEW
This article intend to explain a punctured stripe condition and how to identify under a perc log (present on 25xx, 45xx and 85xx)
SYMPTOM (HOW TO IDENTIFY THE PROBLEM)
What is punctured stripe
Some people also call punctured disk or raid
During the execution of a raid rebuild, if media errors are found on the drive which is the source of the data used to rebuild the raid, the controller will finish the rebuild anyway but you may be exposed to the following data corruption/loss condition:
Source disk - disk existing under the raid which has the data or parity data used to rebuild a raid when a failed disk is replaced
Target disk - the disk that is replacing the failed disk
Perc log reading
In this case the customer hit (bug 33848) but in addition to the disk failure, some of the disks were presenting media errors when the failed disks were replaced. This caused the puncture on the raid.
Note that the analysis of the perc log follows the timeline of the events
You don't need to read all the lines, just the colored lines that points the event described in each of the item.
PD - physical disk
2310 04/26/13 15:59:24: EVT#12504-04/26/13 15:59:24: 44=Time established as 04/26/13 15:59:24; (64 seconds since power on)
2311 04/26/13 16:00:00: EVT#12505-04/26/13 16:00:00: 149=Battery temperature is normal
2312 04/26/13 16:00:00: EVT#12506-04/26/13 16:00:00: 163=Current capacity of the battery is above threshold
2313 04/26/13 16:00:00: EVT#12507-04/26/13 16:00:00: 242=Battery charge complete
2314 04/26/13 16:00:00: findMsiState: Setting MSI Mode to 0 (0=Legacy, 1=MSI, 2=MSI-X)
2315 04/26/13 16:00:00: MfiCmdInitQueue[0]: FW now OPERATIONAL
2316 04/26/13 16:00:00: q.flags.mfaIs64Bits=0, q.flags.contextIs64Bits=0
2317 04/26/13 16:00:00: q.responseQueueEntries=3f0, responseQueueStatr=37e54000
2318 04/26/13 16:00:00: q.producerIndexPtr=180f3000, q.consumerIndexPtr=180f4000
2319 04/26/13 16:00:00: producerIndex=0
2320 04/26/13 16:03:18: EVT#12508-04/26/13 16:03:18: 44=Time established as 04/26/13 16:03:18; (299 seconds since power on)
2321 04/26/13 16:03:18: EVT#12509-04/26/13 16:03:18: 73=VD 02/2 Properties updated to [ID=02,dcp=6d,ccp=6d,ap=0,dc=2,dbgi=0,S=0|0]
2322 04/26/13 16:03:18: EVT#12510-04/26/13 16:03:18: 73=VD 01/1 Properties updated to [ID=01,dcp=6d,ccp=6d,ap=0,dc=2,dbgi=0,S=0|0]
2323 04/26/13 16:03:18: EVT#12511-04/26/13 16:03:18: 73=VD 00/0 Properties updated to [ID=00,dcp=6d,ccp=6d,ap=0,dc=2,dbgi=0,S=0|0]
2324 04/26/13 16:03:18: EVT#12512-04/26/13 16:03:18: 73=VD 00/0 Properties updated to [ID=00,dcp=61,ccp=6d,ap=0,dc=2,dbgi=0,S=0|0]
2325 04/26/13 16:03:18: EVT#12513-04/26/13 16:03:18: 73=VD 01/1 Properties updated to [ID=01,dcp=61,ccp=6d,ap=0,dc=2,dbgi=0,S=0|0]
2326 04/26/13 16:03:19: EVT#12514-04/26/13 16:03:19: 73=VD 02/2 Properties updated to [ID=02,dcp=61,ccp=6d,ap=0,dc=2,dbgi=0,S=0|0]
2327 04/26/13 16:03:19: EVT#12515-04/26/13 16:03:19: 303=Controller properties changed
2328 04/26/13 16:03:19: EVT#12516-04/26/13 16:03:19: 303=Controller properties changed
2329 04/26/13 16:06:23: DM: Chip 0 Paused
2330 04/26/13 16:06:23: Chip <0> Slots: Cur=[193]
2331 04/26/13 16:06:23: [c3]= 9
2332 04/26/13 16:06:25: DM: Timing wheel expired - Chip 0 Slot c3
2333 04/26/13 16:06:25: Chip <0> Slots: Cur=[195]
2334 04/26/13 16:06:25: [c3]= 9
2335 04/26/13 16:06:25: Ctio Issuing Target Reset on Pd 4 Mid 532
2336 04/26/13 16:06:25: DM_PL_AllocateTaskMgmtFrame: TM mid=800 pTaskMgmt=c04d4400
2337 04/26/13 16:06:25: Ctio Issuing Target Reset on Pd 4 Mid 5b7
2338 04/26/13 16:06:25: Ctio Issuing Target Reset on Pd 4 Mid 5be
2339 04/26/13 16:06:25: Ctio Issuing Target Reset on Pd 4 Mid 5c2
2340 04/26/13 16:06:25: Ctio Issuing Target Reset on Pd 4 Mid 5c5
2341 04/26/13 16:06:25: Ctio Issuing Target Reset on Pd 4 Mid 5de
2342 04/26/13 16:06:25: TIMEOUT: rdm 806ea600, devHandle 000d, mid 0603, Cmd 02, devId 0004, devState 0020, devFlags f1482005
2343 04/26/13 16:06:25: TIMEOUT: rdm 806eb000, devHandle 000d, mid 0692, Cmd 02, devId 0004, devState 0020, devFlags f1482005
2344 04/26/13 16:06:25: TIMEOUT: rdm 806eba00, devHandle 000d, mid 06d8, Cmd 02, devId 0004, devState 0020, devFlags f1482005
2356
2357 04/26/13 16:06:25: iopiEvent: EVENT_SAS_DISCOVERY
2358 04/26/13 16:06:25: DM_HandleDiscEvent: Discovery started on Port 3
2359
2360 04/26/13 16:06:25: iopiEvent: MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST
2361 04/26/13 16:06:25: DM_HandleTopologyChgEvnt: PhysicalPort=0 NumberOfPhys=x08 NumEntries=x01 StartPhy=x3
2362 04/26/13 16:06:25: ExpStatus=x00 PhysicalPort=0 EnclosureHandle=x0001 Expander devHandle=x0000
2363 04/26/13 16:06:25: Phy changed - phy 03 devHandle 000d linkRate 99 curLinkRate 9
2364
2365 04/26/13 16:06:25: iopiEvent: EVENT_SAS_DISCOVERY
2366 04/26/13 16:06:25: DM_HandleDiscEvent: Discovery Completed on Port 3
2367
2368 04/26/13 16:06:25: Disc-prog= 0....resetProg=0 aenCount=0 transit=0
2369 04/26/13 16:06:26: EVT#12522-04/26/13 16:06:26: 113=Unexpected sense: PD 04(e0x20/s4) Path 4433221103000000, CDB: 15 10 00 00 10 00, Sense: 6/29/00
2370 04/26/13 16:06:26: Raw Sense for PD 4: 70 00 06 00 00 00 00 0a 00 00 00 00 29 00 00 00 00 00
2371 04/26/13 16:06:26: DevId [4] Restore Queue Depth to 40
2372 04/26/13 16:06:38: DM: Chip 0 Paused
2373 04/26/13 16:06:38: Chip <0> Slots: Cur=[208]
2374 04/26/13 16:06:38: [d2]=2c
2375 04/26/13 16:06:40: DM: Timing wheel expired - Chip 0 Slot d2
2376 04/26/13 16:06:40: Chip <0> Slots: Cur=[210]
2377 04/26/13 16:06:40: [d2]=2c
2378 04/26/13 16:06:40: TIMEOUT: rdm 8075ce00, devHandle 000b, mid 025f, Cmd 02, devId 0006, devState 0020, devFlags f1400005
2379 04/26/13 16:06:40: DM_PL_AllocateTaskMgmtFrame: TM mid=801 pTaskMgmt=c04d4500
2380 04/26/13 16:06:40: TIMEOUT: rdm 8075dc00, devHandle 000b, mid 024c, Cmd 02, devId 0006, devState 0020, devFlags f1482005
2381 04/26/13 16:06:40: TIMEOUT: rdm 8075fc00, devHandle 000b, mid 0249, Cmd 01, devId 0006, devState 0020, devFlags f1482005
2465 04/26/13 16:06:40: EVT#12564-04/26/13 16:06:40: 267=Command timeout on PD 06(e0x20/s6) Path 4433221101000000, CDB: 28 00 17 47 48 80 00 00 80 00
2466 04/26/13 16:06:40: EVT#12565-04/26/13 16:06:40: 267=Command timeout on PD 06(e0x20/s6) Path 4433221101000000, CDB: 28 00 0f 06 39 00 00 01 00 00
2467 04/26/13 16:06:40: EVT#12566-04/26/13 16:06:40: 267=Command timeout on PD 06(e0x20/s6) Path 4433221101000000, CDB: 28 00 0f 06 3a 00 00 01 00 00
2468 04/26/13 16:06:40: EVT#12567-04/26/13 16:06:40: 268=PD 06(e0x20/s6) Path 4433221101000000 reset (Type 03)
2469 04/26/13 16:06:40: MPT_TaskMgmtPostRoutine: TMIdx 1 DevID[6] IOCStatus 0000 MsgAddr c04d4500 type 3 chip 0 devhandle 0 numCmdIssued 2 Qdepth 0 TermCount 2c DevWQC 2c ChipWQC 0 retryCount 0
2470 04/26/13 16:06:40: DevId [6] Reduce Queue Depth to 1 from 40
2471 04/26/13 16:06:40: EVT#12568-04/26/13 16:06:40: 113=Unexpected sense: PD 06(e0x20/s6) Path 4433221101000000, CDB: 2a 00 04 2d f7 80 00 00 80 00, Sense: 6/29/00
2472 04/26/13 16:06:40: Raw Sense for PD 6: 70 00 06 00 00 00 00 0a 00 00 00 00 29 00 00 00 00 00
2473 04/26/13 16:06:40: EVT#12569-04/26/13 16:06:40: 113=Unexpected sense: PD 06(e0x20/s6) Path 4433221101000000, CDB: 00 00 00 00 00 00, Sense: 2/05/00
2474 04/26/13 16:06:40: Raw Sense for PD 6: 70 00 02 00 00 00 00 0a 00 00 00 00 05 00 00 00 00 00
2475 04/26/13 16:06:40: DM_REC: Not Handled Rdm(8076e000) DevId[6], SKEY=2, EXSNS=5 Q=0
2476 04/26/13 16:06:40: DM_REC: TUR Failed, devId =6
2477 04/26/13 16:06:40: EVT#12570-04/26/13 16:06:40: 87=Error on PD 06(e0x20/s6) (Error 02)
2478 04/26/13 16:06:40: EVT#12571-04/26/13 16:06:40: 114=State change on PD 06(e0x20/s6) from ONLINE(18) to FAILED(11)
2479 04/26/13 16:06:40: EVT#12572-04/26/13 16:06:40: 81=State change on VD 00/0 from OPTIMAL(3) to PARTIALLY DEGRADED(1)
2488 04/26/13 16:06:40: iopiEvent: EVENT_SAS_DISCOVERY
2489 04/26/13 16:06:40: DM_HandleDiscEvent: Discovery started on Port 1
2490
2491 04/26/13 16:06:40: iopiEvent: MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST
2492 04/26/13 16:06:40: DM_HandleTopologyChgEvnt: PhysicalPort=0 NumberOfPhys=x08 NumEntries=x01 StartPhy=x1
2493 04/26/13 16:06:40: ExpStatus=x00 PhysicalPort=0 EnclosureHandle=x0001 Expander devHandle=x0000
2494 04/26/13 16:06:40: Phy changed - phy 01 devHandle 000b linkRate 99 curLinkRate 9
2495
2496 04/26/13 16:06:40: iopiEvent: EVENT_SAS_DISCOVERY
2497 04/26/13 16:06:40: DM_HandleDiscEvent: Discovery Completed on Port 1
2498
2499 04/26/13 16:06:40: Disc-prog= 0....resetProg=0 aenCount=0 transit=0
2500 04/26/13 16:10:14: DM: Chip 0 Paused
2501 04/26/13 16:10:14: Chip <0> Slots: Cur=[168]
2502 04/26/13 16:10:14: [aa]= 1 [b4]=56
2503 04/26/13 16:10:16: DM: Timing wheel expired - Chip 0 Slot aa
2504 04/26/13 16:10:16: Chip <0> Slots: Cur=[170]
2505 04/26/13 16:10:16: [aa]= 1
2506 04/26/13 16:10:16: Ctio Issuing Target Reset on Pd 5 Mid 57
2507 04/26/13 16:10:16: DM_PL_AllocateTaskMgmtFrame: TM mid=802 pTaskMgmt=c04d4600
2508 04/26/13 16:10:16: DM_TMWheelScanAllMsg (slot aa):Scanned 0, Timeout 0 Msg mask 21000
2509 04/26/13 16:10:16: EVT#12578-04/26/13 16:10:16: 268=PD 05(e0x20/s5) Path 4433221102000000 reset (Type 03)
2510 04/26/13 16:10:16: MPT_TaskMgmtPostRoutine: TMIdx 2 DevID[5] IOCStatus 0000 MsgAddr c04d4600 type 3 chip 0 devhandle 0 numCmdIssued 95 Qdepth 0 TermCount 1 DevWQC 1a ChipWQC 0 retryCount 0
2511 04/26/13 16:10:16: DevId [5] Reduce Queue Depth to 1 from 40
2512 04/26/13 16:10:16: EVT#12579-04/26/13 16:10:16: 113=Unexpected sense: PD 05(e0x20/s5) Path 4433221102000000, CDB: 28 00 04 2b 5f 80 00 00 80 00, Sense: 6/29/00
2513 04/26/13 16:10:16: Raw Sense for PD 5: 70 00 06 00 00 00 00 0a 00 00 00 00 29 00 00 00 00 00
2514 04/26/13 16:10:16: iopiDiscoveryComplete SubSystem 2 Count 3 InitState 1
2515 04/26/13 16:10:16: Disabling UART for 120s due to discovery
2571 04/26/13 17:52:05: TIMEOUT: rdm 805d1c00, devHandle 000e, mid 02ba, Cmd 01, devId 0003, devState 0020, devFlags f1400005
2572 04/26/13 17:52:05: DM_PL_AllocateTaskMgmtFrame: TM mid=804 pTaskMgmt=c04d4800
2573 04/26/13 17:52:05: TIMEOUT: rdm 805d4c00, devHandle 000e, mid 01a8, Cmd 02, devId 0003, devState 0020, devFlags f1482005
2574 04/26/13 17:52:05: TIMEOUT: rdm 805e9200, devHandle 000e, mid 026b, Cmd 01, devId 0003, devState 0020, devFlags f1482005
2575 04/26/13 17:52:05: TIMEOUT: rdm 805eb800, devHandle 000e, mid 0331, Cmd 01, devId 0003, devState 0020, devFlags f1482005
2576 04/26/13 17:52:05: TIMEOUT: rdm 805eba00, devHandle 000e, mid 03d1, Cmd 01, devId 0003, devState 0020, devFlags f1482005
2577 04/26/13 17:52:05: TIMEOUT: rdm 805ece00, devHandle 000e, mid 05ad, Cmd 01, devId 0003, devState 0020, devFlags f1482005
2578 04/26/13 17:52:05: TIMEOUT: rdm 805f4800, devHandle 000e, mid 02fc, Cmd 01, devId 0003, devState 0020, devFlags f1482005
2579 04/26/13 17:52:05: DM_TMWheelScanAllMsg (slot 87):Scanned 68, Timeout 7 Msg mask 21000
2580 04/26/13 17:52:05: EVT#12585-04/26/13 17:52:05: 267=Command timeout on PD 03(e0x20/s3) Path 4433221104000000, CDB: 28 00 1f 46 0d 00 00 01 ff 00
2581 04/26/13 17:52:05: EVT#12586-04/26/13 17:52:05: 267=Command timeout on PD 03(e0x20/s3) Path 4433221104000000, CDB: 2a 00 05 f4 07 00 00 00 80 00
2582 04/26/13 17:52:05: EVT#12587-04/26/13 17:52:05: 267=Command timeout on PD 03(e0x20/s3) Path 4433221104000000, CDB: 28 00 08 9b 1f 80 00 00 80 00
2583 04/26/13 17:52:05: EVT#12588-04/26/13 17:52:05: 267=Command timeout on PD 03(e0x20/s3) Path 4433221104000000, CDB: 28 00 1f 46 0b 00 00 01 ff 00
2621 04/26/13 17:52:06: iopiEvent: EVENT_SAS_DISCOVERY
2622 04/26/13 17:52:06: DM_HandleDiscEvent: Discovery started on Port 4
2623
2624 04/26/13 17:52:06: iopiEvent: MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST
2625 04/26/13 17:52:06: DM_HandleTopologyChgEvnt: PhysicalPort=0 NumberOfPhys=x08 NumEntries=x01 StartPhy=x4
2626 04/26/13 17:52:06: ExpStatus=x00 PhysicalPort=0 EnclosureHandle=x0001 Expander devHandle=x0000
2627 04/26/13 17:52:06: Phy changed - phy 04 devHandle 000e linkRate 99 curLinkRate 9
2628
2629 04/26/13 17:52:06: iopiEvent: EVENT_SAS_DISCOVERY
2630 04/26/13 17:52:06: DM_HandleDiscEvent: Discovery Completed on Port 4
2631
2632 04/26/13 17:52:06: Disc-prog= 0....resetProg=0 aenCount=0 transit=0
2633 04/26/13 17:52:06: EVT#12596-04/26/13 17:52:06: 113=Unexpected sense: PD 03(e0x20/s3) Path 4433221104000000, CDB: 15 10 00 00 10 00, Sense: 6/29/00
2634 04/26/13 17:52:06: Raw Sense for PD 3: 70 00 06 00 00 00 00 0a 00 00 00 00 29 00 00 00 00 00
2635 04/26/13 17:52:07: DevId [3] Restore Queue Depth to 40
2636 04/26/13 17:52:08: DM: Chip 0 Paused
2637 04/26/13 17:52:08: Chip <0> Slots: Cur=[138]
2638 04/26/13 17:52:08: [8c]= 4 [96]=4d
2639 04/26/13 17:52:10: DM: Timing wheel expired - Chip 0 Slot 8c
2640 04/26/13 17:52:10: Chip <0> Slots: Cur=[140]
2641 04/26/13 17:52:10: [8c]= 4
2642 04/26/13 17:52:10: TIMEOUT: rdm 80602c00, devHandle 000d, mid 033f, Cmd 02, devId 0004, devState 0020, devFlags f1400005
2643 04/26/13 17:52:10: DM_PL_AllocateTaskMgmtFrame: TM mid=805 pTaskMgmt=c04d4900
2644 04/26/13 17:52:10: TIMEOUT: rdm 8061d800, devHandle 000d, mid 04b1, Cmd 02, devId 0004, devState 0020, devFlags f1482005
2645 04/26/13 17:52:10: TIMEOUT: rdm 80632c00, devHandle 000d, mid 05f2, Cmd 02, devId 0004, devState 0020, devFlags f1482005
2646 04/26/13 17:52:10: TIMEOUT: rdm 8065a800, devHandle 000d, mid 00b0, Cmd 02, devId 0004, devState 0020, devFlags f1482005
2647 04/26/13 17:52:10: DM_TMWheelScanAllMsg (slot 8c):Scanned 4, Timeout 4 Msg mask 21000
2648 04/26/13 17:52:10: EVT#12597-04/26/13 17:52:10: 267=Command timeout on PD 04(e0x20/s4) Path 4433221103000000, CDB: 2a 00 04 cb 24 80 00 00 80 00
2649 04/26/13 17:52:10: EVT#12598-04/26/13 17:52:10: 267=Command timeout on PD 04(e0x20/s4) Path 4433221103000000, CDB: 2a 00 04 cb 23 80 00 00 80 00
2650 04/26/13 17:52:10: EVT#12599-04/26/13 17:52:10: 267=Command timeout on PD 04(e0x20/s4) Path 4433221103000000, CDB: 2a 00 04 cb 24 00 00 00 80 00
2651 04/26/13 17:52:10: EVT#12600-04/26/13 17:52:10: 267=Command timeout on PD 04(e0x20/s4) Path 4433221103000000, CDB: 2a 00 04 cb 23 00 00 00 80 00
2659 04/26/13 17:52:10: iopiEvent: EVENT_SAS_DISCOVERY
2660 04/26/13 17:52:10: DM_HandleDiscEvent: Discovery started on Port 3
2661
2662 04/26/13 17:52:10: iopiEvent: MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST
2663 04/26/13 17:52:10: DM_HandleTopologyChgEvnt: PhysicalPort=0 NumberOfPhys=x08 NumEntries=x01 StartPhy=x3
2664 04/26/13 17:52:10: ExpStatus=x00 PhysicalPort=0 EnclosureHandle=x0001 Expander devHandle=x0000
2665 04/26/13 17:52:10: Phy changed - phy 03 devHandle 000d linkRate 99 curLinkRate 9
2666
2667 04/26/13 17:52:10: iopiEvent: EVENT_SAS_DISCOVERY
2668 04/26/13 17:52:10: DM_HandleDiscEvent: Discovery Completed on Port 3
2669
2670 04/26/13 17:52:10: Disc-prog= 0....resetProg=0 aenCount=0 transit=0
2671 04/26/13 17:52:10: EVT#12603-04/26/13 17:52:10: 113=Unexpected sense: PD 04(e0x20/s4) Path 4433221103000000, CDB: 15 10 00 00 10 00, Sense: 6/29/00
2672 04/26/13 17:52:10: Raw Sense for PD 4: 70 00 06 00 00 00 00 0a 00 00 00 00 29 00 00 00 00 00
2673 04/26/13 17:52:11: DevId [4] Restore Queue Depth to 40
2674 04/26/13 19:56:49: DevId [7] Reduce Queue Depth to 1 from 40
2675 04/26/13 19:56:49: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth 12
2676 04/26/13 19:56:49: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth 11
2677 04/26/13 19:56:49: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth 10
2678 04/26/13 19:56:49: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth f
2679 04/26/13 19:56:49: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth e
2680 04/26/13 19:56:49: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth d
2681 04/26/13 19:56:49: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth c
2682 04/26/13 19:56:49: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth b
2683 04/26/13 19:56:49: EVT#12604-04/26/13 19:56:49: 113=Unexpected sense: PD 07(e0x20/s7) Path 4433221100000000, CDB: 28 00 12 3e 2b 00 00 02 00 00, Sense: 3/11/00
2684 04/26/13 19:56:49: Raw Sense for PD 7: f0 00 03 00 00 00 00 0a 00 00 00 00 11 00 00 00 00 00
2685 04/26/13 19:56:49: DEV_REC:Medium Error DevId[7] devHandle a RDM=806cde00 retires=0
2686 04/26/13 19:56:49: MedErr is for: cmdId=2423, ld=2, src=4, cmd=1, lba=b9140400, cnt=80, rmwOp=0
2687 04/26/13 19:56:49: -> recoveryChild: ld=2 orgLi=0 recPhysArm=6 badPhysArm=ff doneFun=38 sRef=ff06 eRef=100 recFlags=0
2688 04/26/13 19:56:49: -> RecParent: cmdId=244f, src=7, cmd=2, lba=4b07c200, cnt=600, rmwOp=0, refs=0/1ff
2689 04/26/13 19:56:49: EVT#12605-04/26/13 19:56:49: 87=Error on PD 07(e0x20/s7) (Error 02)
2690 04/26/13 19:56:49: EVT#12606-04/26/13 19:56:49: 114=State change on PD 07(e0x20/s7) from ONLINE(18) to FAILED(11)
2691 04/26/13 19:56:49: EVT#12607-04/26/13 19:56:49: 81=State change on VD 00/0 from PARTIALLY DEGRADED(1) to DEGRADED(2)
2692 04/26/13 19:56:49: EVT#12608-04/26/13 19:56:49: 251=VD 00/0 is now DEGRADED
2693 04/26/13 19:56:49: EVT#12609-04/26/13 19:56:49: 81=State change on VD 01/1 from PARTIALLY DEGRADED(1) to DEGRADED(2)
2698 04/26/13 20:02:01: EVT#12613-04/26/13 20:02:01: 30=Event log cleared
2699 04/27/13 4:02:05: EVT#12614-04/27/13 4:02:05: 30=Event log cleared
2700 04/27/13 4:05:52: prDiskStart: starting Patrol Read on PD=00
2701 04/27/13 4:05:52: prDiskStart: starting Patrol Read on PD=01
2702 04/27/13 4:05:52: prDiskStart: starting Patrol Read on PD=02
2703 04/27/13 4:05:52: prDiskStart: starting Patrol Read on PD=03
2704 04/27/13 4:05:52: prDiskStart: starting Patrol Read on PD=04
2705 04/27/13 4:05:52: prDiskStart: starting Patrol Read on PD=05
2706 04/27/13 4:05:52: prDiskCheckOkToRun: PR cannot run on this pd=6 not a spare and online
2707 04/27/13 4:05:52: prDiskCheckOkToRun: PR cannot run on this pd=7 not a spare and online
2708 04/27/13 4:05:52: EVT#12615-04/27/13 4:05:52: 39=Patrol Read started
2709 04/27/13 6:46:52: EVT#12616-04/27/13 6:46:52: 113=Unexpected sense: PD 05(e0x20/s5) Path 4433221102000000, CDB: 2f 00 29 70 80 00 00 10 00 00, Sense: 3/11/00
2710 04/27/13 6:46:52: Raw Sense for PD 5: f0 00 03 29 70 8f 01 0a 00 00 00 00 11 00 00 00 00 00
2711 04/27/13 6:46:52: DEV_REC:Medium Error DevId[5] devHandle c RDM=805c6400 retires=0
2712 04/27/13 6:46:52: MedErr is for: cmdId=2400, ld=0, src=0, cmd=2, lba=203cdb, cnt=8, rmwOp=0
2713 04/27/13 6:46:52: ErrLBAOffset (f01) LBA(29708000) BadLba=29708f01
2714 04/27/13 6:46:52: prCallback: Medium Error on pd=05, StartLba=29708000, ErrLba=29708f01
2715 04/27/13 6:46:52: prRecQueue: starting pd=05 recovery - blocking host commands
2716 04/27/13 6:46:54: prRecGo: Ready to attempt recovery errLBA=29708f01 on pd=05
2717 04/27/13 6:46:54: prRecGo: dataErr found on ld=2, span=0, physArm=5, logArm=4, row=11d9d7, strip=6b1b0e, startRef=1,type=0
2718 04/27/13 6:46:54: prRecGo: no correction for degraded span - continue
4895 05/04/13 6:07:41: EVT#12841-05/04/13 6:07:41: 113=Unexpected sense: PD 05(e0x20/s5) Path 4433221102000000, CDB: 2f 00 29 72 93 83 00 10 00 00, Sense: 3/11/00
4896 05/04/13 6:07:41: Raw Sense for PD 5: f0 00 03 29 72 93 84 0a 00 00 00 00 11 00 00 00 00 00
4897 05/04/13 6:07:41: DEV_REC:Medium Error DevId[5] devHandle c RDM=80669400 retires=0
4898 05/04/13 6:07:41: MedErr is for: cmdId=2422, ld=0, src=0, cmd=1, lba=46500, cnt=300, rmwOp=0
4899 05/04/13 6:07:41: ErrLBAOffset (1) LBA(29729383) BadLba=29729384
4900 05/04/13 6:07:41: prCallback: Medium Error on pd=05, StartLba=29729383, ErrLba=29729384
4901 05/04/13 6:07:41: prRecQueue: starting pd=05 recovery - blocking host commands
4902 05/04/13 6:07:41: prRecGo: Ready to attempt recovery errLBA=29729384 on pd=05
4903 05/04/13 6:07:41: prRecGo: dataErr found on ld=2, span=0, physArm=5, logArm=6, row=11dad9, strip=11dad9, startRef=100000084,type=0
4904 05/04/13 6:07:41: prRecGo: no correction for degraded span - continue
4905 05/04/13 6:51:53: prCallback: PR completed for pd=02
4906 05/04/13 6:53:45: prCallback: PR completed for pd=01
4907 05/04/13 6:56:53: prCallback: PR completed for pd=00
4908 05/04/13 6:57:18: prCallback: PR completed for pd=03
4909 05/04/13 6:58:43: prCallback: PR completed for pd=04
4910 05/04/13 7:01:26: prCallback: PR completed for pd=05
4911 05/04/13 7:01:26: PR cycle complete
4912 05/04/13 7:01:26: EVT#12842-05/04/13 7:01:26: 35=Patrol Read complete
4913 05/04/13 7:01:26: Next PR scheduled to start at 05/11/13 4:05:52
4914 05/05/13 4:01:54: EVT#12843-05/05/13 4:01:54: 30=Event log cleared
4915 05/06/13 4:01:53: EVT#12844-05/06/13 4:01:53: 30=Event log cleared
4916 05/06/13 11:37:33: Disabling UART for 120s due to IDR on devH b
4917
4918 05/06/13 11:37:33: iopiEvent: EVENT_SAS_DEVICE_STATUS_CHANGE
>> here PD 6, during replacement you will see the device reset occurring <<<
4919 05/06/13 11:37:33: DM_HandleDevStatusChgEvent: devHandle=x000b SASAdd=4433221101000000 TaskTag=xffff ASC=x00 ASCQ=x00 IOCLogInfo x31110d00 IOCStatus x8000 ReasonCode x08 - INTERNAL_DEVICE_RESET
4920 05/06/13 11:37:35: DevId [6] Reduce Queue Depth to 1 from 40
4922 05/06/13 11:37:35: iopiEvent: EVENT_SAS_DEVICE_STATUS_CHANGE
4923 05/06/13 11:37:35: DM_HandleDevStatusChgEvent: devHandle=x000b SASAdd=4433221101000000 TaskTag=xffff ASC=x00 ASCQ=x00 IOCLogInfo x00000000 IOCStatus x0000 ReasonCode x0e - INTERNAL_DEVICE_RESET complete
4924 05/06/13 11:37:36: iopiDiscoveryComplete SubSystem 2 Count 8 InitState 1
4925 05/06/13 11:37:36: DevId [6] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth 0
4926
4959 05/06/13 11:37:37: EVT#12845-05/06/13 11:37:37: 112=Removed: PD 06(e0x20/s6)
4960 05/06/13 11:37:37: EVT#12846-05/06/13 11:37:37: 248=Removed: PD 06(e0x20/s6) Info: enclPd=20, scsiType=0, portMap=01, sasAddr=4433221101000000,0000000000000000
4961 05/06/13 11:37:37: EVT#12847-05/06/13 11:37:37: 114=State change on PD 06(e0x20/s6) from FAILED(11) to UNCONFIGURED_BAD(1)
4962 05/06/13 11:37:37: LoadBalance 0
4986 05/06/13 11:42:15: PD Flags State Type Size S N Vendor Product Rev P C ID SAS Addr Port Phy DevH BFw BRev
4987 05/06/13 11:42:15: --- -------- ----- ---- -------- - - -------- ---------------- ---- - - -- ---------------- ---- --- ---- ---- ----
4988 05/06/13 11:42:15: 0 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 00 4433221107000000 07 07 11 NA NA
4989 05/06/13 11:42:15: 1 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 01 4433221106000000 06 06 10 NA NA
4990 05/06/13 11:42:15: 2 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 02 4433221105000000 05 05 0f NA NA
4991 05/06/13 11:42:15: 3 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 03 4433221104000000 04 04 0e NA NA
4992 05/06/13 11:42:15: 4 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 04 4433221103000000 03 03 0d NA NA
4993 05/06/13 11:42:15: 5 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 05 4433221102000000 02 02 0c NA NA
4994 05/06/13 11:42:15: 6 f1400005 00020 00 3a38602f 1 1 0 ATA WDC WD5003ABYX-1 1S02 0 0 06 4433221101000000 01 01 0b NA NA
4995 05/06/13 11:42:15: 7 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 07 4433221100000000 00 00 0a NA NA
4996 05/06/13 11:42:15: 20 01400005 00020 0d 0 0 0 0 DP BACKPLANE 1.10 0 0 20 5942b0b0687eb700 00 08 09 NA NA
4997 05/06/13 11:42:15: 100 00400005 00020 03 0 0 0 0 LSI SMP/SGPIO/SEP 1526 0 0 ffff 0 00 ff 00 NA NA
4998
5008 05/06/13 11:42:15: EVT#12848-05/06/13 11:42:15: 91=Inserted: PD 06(e0x20/s6)
5009 05/06/13 11:42:15: EVT#12849-05/06/13 11:42:15: 247=Inserted: PD 06(e0x20/s6) Info: enclPd=20, scsiType=0, portMap=01, sasAddr=4433221101000000,0000000000000000
5010 05/06/13 11:42:15: ArDiskTypeMisMatch : NO_MIXING_VIOLATION array=0 destPD=6
5011 05/06/13 11:42:15: EVT#12850-05/06/13 11:42:15: 114=State change on PD 06(e0x20/s6) from UNCONFIGURED_BAD(1) to UNCONFIGURED_GOOD(0)
5012 05/06/13 11:42:15: ArDiskTypeMisMatch : NO_MIXING_VIOLATION array=0 destPD=6
5013 05/06/13 11:42:15: EVT#12851-05/06/13 11:42:15: 114=State change on PD 06(e0x20/s6) from UNCONFIGURED_GOOD(0) to OFFLINE(10)
5014 05/06/13 11:42:15: EVT#12852-05/06/13 11:42:15: 106=Rebuild automatically started on PD 06(e0x20/s6)
5015 05/06/13 11:42:15: EVT#12853-05/06/13 11:42:15: 114=State change on PD 06(e0x20/s6) from OFFLINE(10) to REBUILD(14)
5049 05/06/13 12:05:11: iopiEvent: EVENT_SAS_DEVICE_STATUS_CHANGE
5050 05/06/13 12:05:11: DM_HandleDevStatusChgEvent: devHandle=x000a SASAdd=4433221100000000 TaskTag=xffff ASC=x00 ASCQ=x00 IOCLogInfo x31110d00 IOCStatus x8000 ReasonCode x08 - INTERNAL_DEVICE_RESET
5051 05/06/13 12:05:12: SES_BackplaneMapping: Undetected device on enclPd 20 StsCode = 5 elmtType 17 elmtIndex 7 slotPd =7 SasAddr =4433221100000000
5052 05/06/13 12:05:14: DevId [7] Reduce Queue Depth to 1 from 40
5053
5054 05/06/13 12:05:14: iopiEvent: EVENT_SAS_DEVICE_STATUS_CHANGE
5055 05/06/13 12:05:14: DM_HandleDevStatusChgEvent: devHandle=x000a SASAdd=4433221100000000 TaskTag=xffff ASC=x00 ASCQ=x00 IOCLogInfo x00000000 IOCStatus x0000 ReasonCode x0e - INTERNAL_DEVICE_RESET complete
5056 05/06/13 12:05:14: iopiDiscoveryComplete SubSystem 2 Count b InitState 1
5057 05/06/13 12:05:14: DevId [7] Reduce Queue Depth recursive retry: maxQDepth 1 : maxDepthChanged 1 : curQDepth 0
5058
5059 05/06/13 12:05:14: iopiEvent: EVENT_SAS_DISCOVERY
5060 05/06/13 12:05:14: DM_HandleDiscEvent: Discovery started on Port 0
5061
5062 05/06/13 12:05:14: iopiEvent: EVENT_SAS_DEVICE_STATUS_CHANGE
5063 05/06/13 12:05:14: DM_HandleDevStatusChgEvent: devHandle=x000a SASAdd=4433221100000000 TaskTag=xffff ASC=x00 ASCQ=x00 IOCLogInfo x31170000 IOCStatus x8000 ReasonCode x08 - INTERNAL_DEVICE_RESET
5064
5065 05/06/13 12:05:14: iopiEvent: MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST
5066 05/06/13 12:05:14: DM_HandleTopologyChgEvnt: PhysicalPort=0 NumberOfPhys=x08 NumEntries=x01 StartPhy=x0
5067 05/06/13 12:05:14: ExpStatus=x00 PhysicalPort=0 EnclosureHandle=x0001 Expander devHandle=x0000
5068 05/06/13 12:05:14: DM_HandleTopologyChgEvnt: PhyIdx 00 devHandle x000a pending removal event
5069
5070 05/06/13 12:05:14: iopiEvent: EVENT_SAS_DISCOVERY
5071 05/06/13 12:05:14: DM_HandleDiscEvent: Discovery Completed on Port 0
5072
5073 05/06/13 12:05:14: Disc-prog= 0....resetProg=0 aenCount=0 transit=0
5074
5075 05/06/13 12:05:14: iopiEvent: EVENT_SAS_DEVICE_STATUS_CHANGE
5076 05/06/13 12:05:14: DM_HandleDevStatusChgEvent: devHandle=x000a SASAdd=4433221100000000 TaskTag=xffff ASC=x00 ASCQ=x00 IOCLogInfo x00000000 IOCStatus x0000 ReasonCode x0e - INTERNAL_DEVICE_RESET complete
5077 05/06/13 12:05:16: iopiDiscoveryComplete SubSystem 2 Count c InitState 1
5078
5079 05/06/13 12:05:16: iopiEvent: EVENT_SAS_DISCOVERY
5080 05/06/13 12:05:16: DM_HandleDiscEvent: Discovery started on Port 0
5081
5082 05/06/13 12:05:16: iopiEvent: MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST
5083 05/06/13 12:05:16: DM_HandleTopologyChgEvnt: PhysicalPort=0 NumberOfPhys=x08 NumEntries=x01 StartPhy=x0
5084 05/06/13 12:05:16: ExpStatus=x00 PhysicalPort=0 EnclosureHandle=x0001 Expander devHandle=x0000
5085 05/06/13 12:05:16: Device REMOVED - phy x00 devHandle x000a devId 7 path x0
5086
5087 05/06/13 12:05:16: iopiEvent: EVENT_SAS_DISCOVERY
5088 05/06/13 12:05:16: DM_HandleDiscEvent: Discovery Completed on Port 0
5089
5090 05/06/13 12:05:16: Disc-prog= 0....resetProg=0 aenCount=0 transit=0
5091 05/06/13 12:05:16: EVT#12876-05/06/13 12:05:16: 112=Removed: PD 07(e0x20/s7)
5092 05/06/13 12:05:16: EVT#12877-05/06/13 12:05:16: 248=Removed: PD 07(e0x20/s7) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=4433221100000000,0000000000000000
5093 05/06/13 12:05:16: EVT#12878-05/06/13 12:05:16: 114=State change on PD 07(e0x20/s7) from FAILED(11) to UNCONFIGURED_BAD(1)
5094 05/06/13 12:05:16: LoadBalance 0
5095 05/06/13 12:05:18: LoadBalanceCmdBlock 0
5096 05/06/13 12:05:18: Load Balance Statistics Path0PDs 8 Path1PDs 0
5097 05/06/13 12:06:14: EVT#12879-05/06/13 12:06:14: 103=Rebuild progress on PD 06(e0x20/s6) is 20.98%(1439s)
5098 05/06/13 12:08:00: EVT#12880-05/06/13 12:08:00: 103=Rebuild progress on PD 06(e0x20/s6) is 21.98%(1545s)
5105 05/06/13 12:08:07: iopiEvent: MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST
5106 05/06/13 12:08:07: DM_HandleTopologyChgEvnt: PhysicalPort=0 NumberOfPhys=x08 NumEntries=x01 StartPhy=x0
5107 05/06/13 12:08:07: ExpStatus=x00 PhysicalPort=0 EnclosureHandle=x0001 Expander devHandle=x0000
5108 05/06/13 12:08:07: Device ADDED - phy x00 devHandle x000a parentDevHdl x0001 phyNum x00 slot x07 deviceInfo=x00000881 sasAddr=4433221100000000
5109
5110 05/06/13 12:08:07: iopiEvent: EVENT_SAS_DISCOVERY
5111 05/06/13 12:08:07: DM_HandleDiscEvent: Discovery Completed on Port 0
5112
5113 05/06/13 12:08:07: DM_DevMgrIsChipInit 0 State 400
5114 05/06/13 12:08:09: DM_DevSSUCallback: DevId x0007 errCount 0 Retry 0 Status 2
5115 05/06/13 12:08:09: DM_DevSSUCallback: SENSE Len 12 ResponseCode 70 senseKey 6 asc 29 ascq 0
5116 05/06/13 12:08:09: devId 7 - slot num - 7
5117 05/06/13 12:08:09: DM_DevNotifyRAID: Notify Done. Check for Removal
5118 05/06/13 12:08:09: gDevInfo=84235ea0, size=160
5119 05/06/13 12:08:09: Total Device = 9
5120 05/06/13 12:08:09: PD Flags State Type Size S N Vendor Product Rev P C ID SAS Addr Port Phy DevH BFw BRev
5121 05/06/13 12:08:09: --- -------- ----- ---- -------- - - -------- ---------------- ---- - - -- ---------------- ---- --- ---- ---- ----
5122 05/06/13 12:08:09: 0 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 00 4433221107000000 07 07 11 NA NA
5123 05/06/13 12:08:09: 1 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 01 4433221106000000 06 06 10 NA NA
5124 05/06/13 12:08:09: 2 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 02 4433221105000000 05 05 0f NA NA
5125 05/06/13 12:08:09: 3 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 03 4433221104000000 04 04 0e NA NA
5126 05/06/13 12:08:09: 4 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 04 4433221103000000 03 03 0d NA NA
5127 05/06/13 12:08:09: 5 f1400005 00020 00 3a38602f 1 1 0 ATA SAMSUNG HE502HJ 0001 0 0 05 4433221102000000 02 02 0c NA NA
5128 05/06/13 12:08:09: 6 f1400005 00020 00 3a38602f 1 1 0 ATA WDC WD5003ABYX-1 1S02 0 0 06 4433221101000000 01 01 0b NA NA
5129 05/06/13 12:08:09: 7 f1400005 00020 00 3a38602f 1 1 0 ATA WDC WD5003ABYX-1 1S02 0 0 07 4433221100000000 00 00 0a NA NA
5130 05/06/13 12:08:09: 20 01400005 00020 0d 0 0 0 0 DP BACKPLANE 1.10 0 0 20 5942b0b0687eb700 00 08 09 NA NA
5131 05/06/13 12:08:09: 100 00400005 00020 03 0 0 0 0 LSI SMP/SGPIO/SEP 1526 0 0 ffff 0 00 ff 00 NA NA
5132
5141 05/06/13 12:08:09: Load Balance Statistics Path0PDs 9 Path1PDs 0
5142 05/06/13 12:08:09: EVT#12881-05/06/13 12:08:09: 91=Inserted: PD 07(e0x20/s7)
5143 05/06/13 12:08:09: EVT#12882-05/06/13 12:08:09: 247=Inserted: PD 07(e0x20/s7) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=4433221100000000,0000000000000000
5144 05/06/13 12:08:09: ArDiskTypeMisMatch : NO_MIXING_VIOLATION array=0 destPD=7
5145 05/06/13 12:08:09: EVT#12883-05/06/13 12:08:09: 114=State change on PD 07(e0x20/s7) from UNCONFIGURED_BAD(1) to UNCONFIGURED_GOOD(0)
5146 05/06/13 12:08:09: ArDiskTypeMisMatch : NO_MIXING_VIOLATION array=0 destPD=7
5147 05/06/13 12:08:09: EVT#12884-05/06/13 12:08:09: 114=State change on PD 07(e0x20/s7) from UNCONFIGURED_GOOD(0) to OFFLINE(10)
5148 05/06/13 12:08:09: EVT#12885-05/06/13 12:08:09: 106=Rebuild automatically started on PD 07(e0x20/s7)
5149 05/06/13 12:08:09: EVT#12886-05/06/13 12:08:09: 114=State change on PD 07(e0x20/s7) from OFFLINE(10) to REBUILD(14)
5150 05/06/13 12:12:25: EVT#12887-05/06/13 12:12:25: 103=Rebuild progress on PD 07(e0x20/s7) is 0.99%(256s)
5151 05/06/13 12:13:05: EVT#12888-05/06/13 12:13:05: 103=Rebuild progress on PD 06(e0x20/s6) is 22.98%(1850s)
5152 05/06/13 12:16:32: EVT#12889-05/06/13 12:16:32: 103=Rebuild progress on PD 07(e0x20/s7) is 1.99%(503s)
5153 05/06/13 12:18:25: EVT#12890-05/06/13 12:18:25: 103=Rebuild progress on PD 06(e0x20/s6) is 23.98%(2170s)
5154 05/06/13 12:20:37: EVT#12891-05/06/13 12:20:37: 103=Rebuild progress on PD 07(e0x20/s7) is 2.99%(748s)
5155 05/06/13 12:23:47: EVT#12892-05/06/13 12:23:47: 103=Rebuild progress on PD 06(e0x20/s6) is 24.98%(2492s)
5156 05/06/13 12:24:58: EVT#12893-05/06/13 12:24:58: 103=Rebuild progress on PD 07(e0x20/s7) is 3.99%(1009s)
5157 05/06/13 12:29:13: EVT#12894-05/06/13 12:29:13: 103=Rebuild progress on PD 06(e0x20/s6) is 25.98%(2818s)
5158 05/06/13 12:29:17: EVT#12895-05/06/13 12:29:17: 103=Rebuild progress on PD 07(e0x20/s7) is 4.99%(1268s)
5159 05/06/13 12:33:59: EVT#12896-05/06/13 12:33:59: 103=Rebuild progress on PD 07(e0x20/s7) is 5.99%(1550s)
5160 05/06/13 12:35:04: EVT#12897-05/06/13 12:35:04: 103=Rebuild progress on PD 06(e0x20/s6) is 26.98%(3169s)
5228 05/06/13 14:23:06: EVT#12963-05/06/13 14:23:06: 113=Unexpected sense: PD 05(e0x20/s5) Path 4433221102000000, CDB: 28 00 29 70 8f 00 00 02 00 00, Sense: 3/11/00
5229 05/06/13 14:23:06: Raw Sense for PD 5: f0 00 03 29 70 8f 01 0a 00 00 00 00 11 00 00 00 00 00
5230 05/06/13 14:23:06: DEV_REC:Medium Error DevId[5] devHandle c RDM=80648a00 retires=0
5231 05/06/13 14:23:06: MedErr is for: cmdId=248f, ld=2, src=4, cmd=1, lba=11d9d7, cnt=f8, rmwOp=0
5232 05/06/13 14:23:06: -> recoveryChild: ld=2 orgLi=5 recPhysArm=6 badPhysArm=7 doneFun=c080219c sRef=0 eRef=1ff recFlags=0
5233 05/06/13 14:23:06: -> RecParent: cmdId=248f, src=4, cmd=1, lba=11d9d7, cnt=f8, rmwOp=0, refs=2/79
5234 05/06/13 14:23:06: ErrLBAOffset (1) LBA(29708f00) BadLba=29708f01
5235 05/06/13 14:23:06: Read MED ERR!!! ErrLBA(1) LBA(29708f00) TotalBlks (200)
5236 05/06/13 14:23:06: MPTCMD: Issue Read ME Recovery RDM=80648a00 Cmd 1 Flags 60000 Rec State 17 Retry 1
5237 05/06/13 14:23:07: DM_REC:Completing recovery SM DevId 5 pRdm 80648a00 CQC 0 WQC 1
5238 05/06/13 14:23:07: LDBBMListAddReadError: 2 drives rebuilding: arm1: 6, arm2: 7
5239 05/06/13 14:23:07: LDBBMListAddReadError: LDBBM cannot act on LBA: 29708f01, pd: 6, firstRebuildingPd: 6secondRebuildingPd: 7!
5240 05/06/13 14:23:07: EVT#12964-05/06/13 14:23:07: 111=Unrecoverable medium error during recovery on PD 05(e0x20/s5) at 29708f01
5241 05/06/13 14:23:07: BBMProcessReadError: RECOVERY, pd=05, pdErrLba=29708f01 - puncture target drive
5242 05/06/13 14:23:07: BBMMarkBadBlock: pd=07, pdLBA=29708f01
5243 05/06/13 14:23:07: EVT#12965-05/06/13 14:23:07: 97=Puncturing bad block on PD 07(e0x20/s7) at 29708f01
5244 05/06/13 14:23:07: BBMMarkBadBlock: pd=06, pdLBA=29708f01
5245 05/06/13 14:23:07: EVT#12966-05/06/13 14:23:07: 97=Puncturing bad block on PD 06(e0x20/s6) at 29708f01
5246 05/06/13 14:23:07: DevId [5] Restore Queue Depth to 40
5247 05/06/13 14:23:08: DevId [5] Reduce Queue Depth to 1 from 40
DXi Log
On the other hand, while all those disk events were occurring and being reported under the perc log, DXi had only the following alerts issued during that time:
Apr 26 16:07:16 2013 1 UNKNOWN 23 PHYSICAL_DISK_1:0:6 PERC_H700_INTEGRATED_(SLOT_4) 23 22 SAMSUNG HE502HJ
Apr 26 16:07:16 2013 1 UNKNOWN 70 0-BOOT PERC_H700_INTEGRATED_(SLOT_4) 70 115 The RAID Volume redundancy is degraded.
Apr 26 16:07:16 2013 1 UNKNOWN 70 1-META PERC_H700_INTEGRATED_(SLOT_4) 70 115 The RAID Volume redundancy is degraded.
Apr 26 16:07:16 2013 1 UNKNOWN 70 2-DATA PERC_H700_INTEGRATED_(SLOT_4) 70 115 The RAID Volume redundancy is degraded.
Apr 26 19:58:56 2013 1 UNKNOWN 23 PHYSICAL_DISK_1:0:7 PERC_H700_INTEGRATED_(SLOT_4) 23 22 SAMSUNG HE502HJ
May 6 11:38:12 2013 1 UNKNOWN 77 SAS_PHY_1 PERC_H700_INTEGRATED_(SLOT_4) 77 124 The Alarm SAS_PHY_1 has generated a warning condition.
May 6 12:06:02 2013 1 UNKNOWN 77 SAS_PHY_0 PERC_H700_INTEGRATED_(SLOT_4) 77 124 The Alarm SAS_PHY_0 has generated a warning condition.
May 6 17:49:12 2013 1 UNKNOWN 23 PHYSICAL_DISK_1:0:5 PERC_H700_INTEGRATED_(SLOT_4) 23 22 SAMSUNG HE502HJ
May 15 15:01:14 2013 1 UNKNOWN 77 SAS_PHY_2 PERC_H700_INTEGRATED_(SLOT_4) 77 124 The Alarm SAS_PHY_2 has generated a warning condition.
May 15 18:32:03 2013 1 UNKNOWN 70 2-DATA PERC_H700_INTEGRATED_(SLOT_4) 70 120 The Volume 2-DATA has failed.
May 16 15:27:55 2013 1 UNKNOWN 70 2-DATA PERC_H700_INTEGRATED_(SLOT_4) 70 120 The Volume 2-DATA has failed.
May 16 15:44:59 2013 1 UNKNOWN 70 2-DATA PERC_H700_INTEGRATED_(SLOT_4) 70 120 The Volume 2-DATA has failed.
RESOLUTION (WORKAROUND OR FIX)
In the case we studied above were we had 2 failed disks and other disks with media errors, there is little to do. The raid in this case is a RAID6 so there was no condition for the technician to replace a third disk (the one with media errors) prior to the other two.
Once the data is punctured you faced dataloss and this may or may not be visible to the customer, because a puncture could occur on a block of data customer won't use right away or it can happen in a empty block. But if the puncture occurs on a block of data where data of the OS is stored you will see issues right away depends of the data affected.
Dell recommendation for this kind of occurrence is to initialize the RAID and restore from backup (in this case, from replication DXi)
If I can't solve, how I can avoid this issue
As mentioned above, this case we studies under this article is a extreme where there was little to be done.
But supposing you have one disk with bad blocks and one that failed. You want to replace the one with bad media, let data to rebuild, verify there is no other disk presenting bad blocks and after that you will replace the failed one.
Because DXi will have little information regarding media issues, you may want to dig the perc controller log
In case you still see block errors after replacing the bad media disk, please consult Dell regarding the replacement of the failed disk
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |