Path-To-Tape Failure in NetBackup Environment with SCSI Reservation Conflict Errors (DRAFT) |
SR Information: SR 1610340 Product / Software Version and Configuration: 2.2.1 on DXi6540, OST 2.6, NDMP Path-To-Tape (PTT), Scalar i6000 Tape library. Additional configuration details: The Scalar i6000k library connected to DXi6540 has 8 tape drives configured for drive sharing between NetBackup master and media servers and one NetApp filer. Problem Description: PTT stopped/not working on NetBackup master/media servers with reservation conflict errors.
Related PTR: 32483
|
A customer reported that PTT stopped working after it had been previously working.
The DXi logs were filled with:
/var/log/messages:
Sep 15 06:29:18 tmwdxihou01 kernel: st3: Error 18 (sugg. bt 0x0, driver bt 0x0, host bt 0x0).
Sep 15 06:29:19 tmwdxihou01 kernel: st 5:0:4:0: reservation conflict
Sep 15 06:29:19 tmwdxihou01 kernel: st3: Error 18 (sugg. bt 0x0, driver bt 0x0, host bt 0x0
...
Sep 14 15:30:35 tmwdxihou01 kernel: st3: Error 18 (sugg. bt 0x0, driver bt 0x0, host bt 0x0).
Sep 14 15:30:36 tmwdxihou01 kernel: st 5:0:3:0: reservation conflict
Sep 14 15:30:36 tmwdxihou01 kernel: st6: Error 18 (sugg. bt 0x0, driver bt 0x0, host bt 0x0).
/var/log/DXi/tsunmi.log:
ERROR - 09/14/13-14:32:58 - ndmp tape.cc(475) [ndmp-23234] ndmpdTapeOpen() - opening tape device: /dev/alias/nst/F0024E4013, Device or resource busy (16).
…
ERROR - 09/14/13-15:15:34 - ndmp tape.cc(475) [ndmp-23517] ndmpdTapeOpen() - opening tape device: /dev/alias/nst/F0024E400D, Input/output error (5).
ERROR - 09/14/13-15:15:36 - ndmp tape.cc(404) [ndmp-23531] ndmpdTapeOpen() - opening tape device: /dev/alias/nst/F0024E400D, Input/output error (5).
ERROR - 09/14/13-15:15:39 - ndmp tape.cc(475) [ndmp-23534] ndmpdTapeOpen() - opening tape device: /dev/alias/nst/F0024E401F, Input/output error (5).
ERROR - 09/14/13-15:15:40 - ndmp tape.cc(475) [ndmp-23538] ndmpdTapeOpen() - opening tape device: /dev/alias/nst/F0024E4007, Input/output error (5).
...
ERROR - 09/14/13-16:49:19 - ndmp config.cc(641) [ndmp-10131] ndmpdConfigGetTapeInfo() - ndmpdConfigGetTapeInfo: scandir of /dev/alias/nst/4:0:3:0 failed(2) - No such file or directory
ERROR - 09/14/13-16:49:19 - ndmp config.cc(641) [ndmp-10131] ndmpdConfigGetTapeInfo() - ndmpdConfigGetTapeInfo: scandir of /dev/alias/nst/5:0:3:0 failed(2) - No such file or directory
WARN - 09/14/13-16:49:19 - ndmp config.cc(683) [ndmp-10131] ndmpdConfigGetTapeInfo() - Device /dev/alias/nst/F0024E4001/5:0:4:0 not Configured.
WARN - 09/14/13-16:49:19 - ndmp config.cc(683) [ndmp-10131] ndmpdConfigGetTapeInfo() - Device /dev/alias/nst/F0024E4007/4:0:3:0 not Configured.
The messages file for the NetBackup master and media servers also showed quite a bit of reservation conflict errors. In addition, the NetBackup “vmoprcmd“ when run on any of the NetBackup hosts (master or media) server participating in drive sharing showed the status of the tape drives as “PEND-TLD” with one media server, in addition, showing some drives status as “DOWN-TLD”.
All servers that accessed drives on the Scalar i6000 library were stopped. All media was unloaded from the tape drives. All tape drives were then recycled. Special device files that did not have tape drive serial numbers (example: /dev/alias/nst/4:0:3:0) were deleted and reconfigured. (TS: Is this a summary of what needs to be done below? Or are these pre-requisite steps that need to be done prior to starting step 1 below?)
To resolve this issue, do the following:
echo "scsi remove-single-device 4 0 3 0" >/proc/scsi/scsi <== for each drive
rm -rf /dev/alias/nst/4:0:3:0
/opt/DXi/rescan-scsi-bus -r
The configuration should result in the tape library (TLD) configured with drives status as normal. However, upon running backup jobs to the tape drives, the drives status will change to PEND-TLD and backup jobs will queue up.
NOTE: The media server was later rebooted by the customer on the advice of Symantec support and re-joined in drive sharing. It is believed that the media server with the DOWN-TLD was receiving but not releasing scsi reservations, hence the conflicts.
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |