SR3636188 Cant mount Media, Drive goes offline |
SR Information: 3636188 QUANTUM STORAGE (INDIA)
Problem Description: Cant mount Media into recently replaced Drive. Drive will go offline.
Product / Software Version:
MDC:SNFS 4.1.2 Library: Oracle STK |
Overview
Vsmount will fail and drive will be taken offline.
What is ACSLS?
Automated Cartridge System Library Software (ACSLS) is Sun/Oracle StorageTek's server software that controls a Sun StorageTek tape library. An Automated Cartridge System (ACS) is a group of tape libraries connected through passthru-ports (PTPs).
ACSLS accesses and manages information stored in one or more ACSs through command processing across a network.
The software includes a system administration component and interfaces to client system applications, and library management facilities
Symptoms & Identifying the problem
The Customer replaced a Drive within their STK Library claimed the drive will be taken offline when its being mounted using vsmount
## 1 ## Log Review:
MSM Tac Log
Jan 13 16:47:47 MDS04 snmsm XdiAMTask_8[25182]: E0416(6)<1401354217>:MountAMCmd2546: Archive NearLine_02: Medium: N02513 and Drive: 1 selected to be mounted
Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7024(7)<00000>:xdiStkError110: STK Request: 9 failed: Drive is offline.; errno: 31
Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7071(7)<00000>:xdiStk3372: STK Mount Request of Media ID: %, failed
Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7087(7)<00000>:xdiUtil300: Xdi Error: 140; Text: DRIVE ERROR: Device is not available.
Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[25182]: E0714(4)<1401354217>:XdiPrimitive1958: Archive NearLine_02: MOUNT of Medium N02513 for Drive Slot 00000,00000,00012,00005 was Failed (90 - drive offline or unavailable)
Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[25182]: E0000(1)<1401354218>:XdiMountAMCmd1522: SRVCLOG RCOMP: 12 RINST: MSM VCOMP: 49 VINST: UNKNOWN VPINST: UNKNOWN EVENT: 54 TEXT: SNMS: Mount failed for drive: 0,1,1,6 in archive: NearLine_02, error: drive offline or unavailable. Drive varied offline.
## 2 ## Troubleshooting:
Verify the MSM and Database Configuration making sure there is no mapping Error on our side
dbdrvslot - Queries for drive slot information about an archive
1:0,1,1,6:0,0,12,5:HU1208M7PB
This information consists of the following colon separated values:
- drive identifier
- hardware location information
- slot
- drive serial number
Verify Linter DB tdlmdb Table drive vs. xdicomp [ for mysql installs please use the following syntax “select * from xdicomp;” | mysql -D–tdlmdb ]
Xdicomp Table – [root@MDC]# echo “select * from xdicomp;” | inl –u tdlmdb/
[ARCHIVEID COMPID HARDWARENAME COMPTYPE COMPSTATE MEDIAID ASSIGNMENTSTATE MEDIATYPEMAP SERIALNUMBER
| 8|00000,00000,00012,00005 |0,1,1,6 | 12 | 1 | | 2 | 00 00 00 00 10 |HU1208M7PB |
DriveTable - [root@MDC]# echo “select * from drive;” | inl –u tdlmdb/
DRIVEID DRIVETYPE ARCHIVEID DRIVESTATE PARENTSTATE ASSIGNMENTSTATE MOUNTSTATE USAGECOUNT LOCKID MEDIAID COMPID
| 1| 1 | 8| 1 | 1 | 2 | 2 | 0| 0| |00000,00000,00012,00005
Note:
- Drive Serialnumber listed only in tdlmdb xdicomp table (MSM config ) and tmdb cfgdir table (TSM config)
- CompID links both tables
- ArchiveID need to be the same for both tables
- ArchiveID need to match the ArchiveID of the physical library within tdlmdb archive table
Verify Drive Configuration File & Verify Lockinfo File
Configuration Files for MSM are located at /usr/adic/MSM/internal/config/
[root@MDC]# cat /usr/adic/MSM/internal/config/drive_file_NearLine_02
ArchiveName ArchiveType Logical Drive# Physical DriveID
NearLine_02 STKSILO 0 0,0,1,3
NearLine_02 STKSILO 1 0,0,1,6
NearLine_02 STKSILO 2 0,0,1,9
NearLine_02 STKSILO 3 0,0,1,12
NearLine_02 STKSILO 4 0,1,1,3
NearLine_02 STKSILO 5 0,1,1,6
NearLine_02 STKSILO 6 0,1,1,9
NearLine_02 STKSILO 7 0,1,1,12
That file (lockInfo_<libraryName> tracks where drives are located in the ACSLS connected library (by the Library's equipmentlocation). By knowing that equipment location, our dismount calls can tell ACSLS what drive equipment location needs to be unlocked, and what drive location can be mounted/dismounted.
[root@MDC]# od -S7 /usr/adic/MSM/internal/config/lockInfo_NearLine_02
0000000 0,0,1,3
0000054 0,0,1,6
0000130 0,0,1,9
0000204 0,0,1,12
0000260 0,1,1,3
0000334 0,1,1,6
0000410 0,1,1,9
0000464 0,1,1,12
Note:
- The lock file is only for ACSLS and it maintains whether the drive is in use.
- You can use xxd or od to look at the contents
- It contains the slot information and 0's if there is no lock for each drive section. If bits are set then there is a lock.
- The file is recreated if missing when MSM is restarted.
## 3 ## Root Cause:
The Drive Configuration and MSM Configuration file’s looks good. So let’s re-review the MSM Tac Logs and Errors.
Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7024(7)<00000>:xdiStkError110: STK Request: 9 failed: Drive is offline.; errno: 31
Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7087(7)<00000>:xdiUtil300: Xdi Error: 140; Text: DRIVE ERROR: Device is not available.
Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[25182]: E0714(4)<1401354217>:XdiPrimitive1958: Archive NearLine_02: MOUNT of Medium N02513 for Drive Slot 00000,00000,00012,00005 was Failed (90 - drive offline or unavailable)
From the Error Message above, we can take the code which logged the error and can check their header files
to resolve the Error Code.
xdiStkError.c
NAME: XdiStkErrorMap
* PURPOSE:
* This function maps a STK specific error into a generic XDI error code.
case STATUS_DRIVE_OFFLINE:
xdi_error = XDI_ERR_DRV_NOT_AVAIL;
break;
xdiStkErrorP.h
/* 30 */
[30]"Drive is not in the library.",
[31]"Drive is offline.",
xdiUtilP.h
/* 140 */ "DRIVE ERROR: Device is not available.",
XdiUtility.C
case XDI_ERR_DRV_NOT_AVAIL:
errorCode = AMERR_DRIVE_NOT_ONLINE;
break;
AMCmd_defs.h
AMERR_DRIVE_NOT_ONLINE = 90,
The description of the function XdiStkErrorMap makes clear that the error comes directly from the STK itself.
Logging into the ACSLS and query the drive status, reveals that the Drive is logically offline within the STK.
STK Admin Guide ACSLS
https://docs.oracle.com/cd/E19775-01/802UpAdm/802UpAdm.pdf
ACSSA> q drive all
2016-01-14 14:35:23 Drive Status
Identifier State Status Volume Type
0, 0, 1, 3 online in use N02111 HP-LTO5
0, 0, 1, 6 online in use HP-LTO5
0, 0, 1, 9 online in use N01502 HP-LTO5
0, 0, 1,12 online in use N01376 HP-LTO5
0, 1, 1, 3 online in use N02107 HP-LTO5
0, 1, 1, 6 offline available HP-LTO5
0, 1, 1, 9 online available HP-LTO5
0, 1, 1,12 online in use N02260 HP-LTO5
ACSSA> vary drive 0,1,1,6 online
2016-01-14 14:52:26 52988 drive 0, 1, 1, 6: Online.
Vary: drive 0, 1, 1, 6 varied online.
ACSSA> q drive all
2016-01-14 14:52:35 Drive Status
Identifier State Status Volume Type
0, 1, 1, 6 online available HP-LTO5
Vary Online the Drive resolved the problem.
What we learn from this case:
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |