SR3636188 Cant mount Media, Drive goes offline

SR Information: 3636188  QUANTUM STORAGE (INDIA)

 

Problem Description: Cant mount Media into recently replaced Drive. Drive will go offline.

 

Product / Software Version:

 

MDC:SNFS 4.1.2

Library: Oracle STK

 

 Overview

Vsmount will fail and drive will be taken offline.

 

What is ACSLS?

 

Automated Cartridge System Library Software (ACSLS) is Sun/Oracle StorageTek's server software that controls a Sun StorageTek tape library. An Automated Cartridge System (ACS) is a group of tape libraries connected through passthru-ports (PTPs).

ACSLS accesses and manages information stored in one or more ACSs through command processing across a network.

The software includes a system administration component and interfaces to client system applications, and library management facilities

 

 

 

 

Symptoms & Identifying the problem

 

The Customer replaced a Drive within their STK Library claimed the drive will be taken offline when its being mounted using vsmount

 

## 1 ## Log Review:

 

MSM Tac Log

 

Jan 13 16:47:47 MDS04 snmsm XdiAMTask_8[25182]: E0416(6)<1401354217>:MountAMCmd2546:  Archive NearLine_02: Medium: N02513 and Drive: 1 selected to be mounted

Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7024(7)<00000>:xdiStkError110:  STK Request: 9 failed: Drive is offline.; errno: 31

Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7071(7)<00000>:xdiStk3372:  STK Mount Request of Media ID: %, failed

Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7087(7)<00000>:xdiUtil300:  Xdi Error: 140; Text: DRIVE ERROR: Device is not available.

Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[25182]: E0714(4)<1401354217>:XdiPrimitive1958:  Archive NearLine_02: MOUNT of Medium N02513 for Drive Slot 00000,00000,00012,00005 was Failed (90 - drive offline or unavailable)

Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[25182]: E0000(1)<1401354218>:XdiMountAMCmd1522:  SRVCLOG RCOMP: 12 RINST: MSM VCOMP: 49 VINST: UNKNOWN VPINST: UNKNOWN EVENT: 54 TEXT: SNMS: Mount failed for drive: 0,1,1,6 in archive: NearLine_02, error: drive offline or unavailable. Drive varied offline.

 

 

 

 

## 2 ## Troubleshooting:

 

Verify the MSM and Database Configuration making sure there is no mapping Error on our side

 

dbdrvslot - Queries for drive slot information about an archive

1:0,1,1,6:0,0,12,5:HU1208M7PB

 

This information consists of the following colon separated values:

       -  drive identifier

       -  hardware location information

       -  slot

       -  drive serial number

 

Verify Linter DB tdlmdb Table drive vs. xdicomp [ for mysql installs please use the following syntax “select * from xdicomp;” | mysql -D–tdlmdb ]

 

Xdicomp Table – [root@MDC]# echo “select * from xdicomp;” | inl –u tdlmdb/

[ARCHIVEID   COMPID HARDWARENAME  COMPTYPE COMPSTATE MEDIAID   ASSIGNMENTSTATE MEDIATYPEMAP     SERIALNUMBER

|          8|00000,00000,00012,00005         |0,1,1,6                         |    12  |     1   |                |     2         | 00 00 00 00 10 |HU1208M7PB                      |

 

DriveTable  - [root@MDC]# echo “select * from drive;” | inl –u tdlmdb/

DRIVEID     DRIVETYPE ARCHIVEID   DRIVESTATE PARENTSTATE ASSIGNMENTSTATE MOUNTSTATE USAGECOUNT  LOCKID      MEDIAID   COMPID

|          1|     1   |          8|     1    |     1     |     2         |     2    |          0|          0|                |00000,00000,00012,00005  

 

Note:

- Drive Serialnumber listed only in tdlmdb xdicomp table (MSM config ) and tmdb cfgdir table (TSM config)

- CompID links both tables

- ArchiveID need to be the same for both tables

- ArchiveID need to match the ArchiveID of the physical library within tdlmdb archive table

 

 

 

 

Verify Drive Configuration File & Verify Lockinfo File

 

Configuration Files for MSM are located at /usr/adic/MSM/internal/config/

 

[root@MDC]# cat /usr/adic/MSM/internal/config/drive_file_NearLine_02

 

ArchiveName       ArchiveType       Logical Drive#    Physical DriveID

 

NearLine_02 STKSILO 0 0,0,1,3

NearLine_02 STKSILO 1 0,0,1,6

NearLine_02 STKSILO 2 0,0,1,9

NearLine_02 STKSILO 3 0,0,1,12

NearLine_02 STKSILO 4 0,1,1,3

NearLine_02 STKSILO 5 0,1,1,6

NearLine_02 STKSILO 6 0,1,1,9

NearLine_02 STKSILO 7 0,1,1,12

 

That file (lockInfo_<libraryName> tracks where drives  are located in the ACSLS connected library (by the Library's equipmentlocation).  By knowing that equipment location, our dismount calls can tell ACSLS what drive equipment location needs to be unlocked, and what drive location can be mounted/dismounted.

 

[root@MDC]# od -S7 /usr/adic/MSM/internal/config/lockInfo_NearLine_02

0000000 0,0,1,3

0000054 0,0,1,6

0000130 0,0,1,9

0000204 0,0,1,12

0000260 0,1,1,3

0000334 0,1,1,6

0000410 0,1,1,9

0000464 0,1,1,12

 

Note:

- The lock file is only for ACSLS and it maintains whether the drive is in use.

- You can use xxd or od to look at the contents

- It contains the slot information and 0's if there is no lock for each drive section. If bits are set then there is a lock.

- The file is recreated if missing when MSM is restarted.

 

 

 

 

 

## 3 ## Root Cause:

 

The Drive Configuration and MSM Configuration files looks good. So lets re-review the MSM Tac Logs and Errors.

 

Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7024(7)<00000>:xdiStkError110:  STK Request: 9 failed: Drive is offline.; errno: 31

Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[23454]: E7087(7)<00000>:xdiUtil300:  Xdi Error: 140; Text: DRIVE ERROR: Device is not available.

 

Jan 13 16:47:56 MDS04 snmsm XdiAMTask_8[25182]: E0714(4)<1401354217>:XdiPrimitive1958:  Archive NearLine_02: MOUNT of Medium N02513 for Drive Slot 00000,00000,00012,00005 was Failed (90 - drive offline or unavailable)

 

 

From the Error Message above, we can take the code which logged the error and can check their header files

to resolve the Error Code.

 

xdiStkError.c

NAME: XdiStkErrorMap

*  PURPOSE:

*  This function maps a STK specific error into a generic XDI error code.

 

case STATUS_DRIVE_OFFLINE:

         xdi_error = XDI_ERR_DRV_NOT_AVAIL;

      break;

               

xdiStkErrorP.h

/* 30 */

[30]"Drive is not in the library.",

[31]"Drive is offline.",

 

xdiUtilP.h

  /* 140 */ "DRIVE ERROR: Device is not available.",

 

XdiUtility.C

case XDI_ERR_DRV_NOT_AVAIL:

         errorCode = AMERR_DRIVE_NOT_ONLINE;

      break;

 

AMCmd_defs.h

AMERR_DRIVE_NOT_ONLINE = 90,

 

The description of the function XdiStkErrorMap makes clear that the error comes directly from the STK itself.

 

Logging into the ACSLS and query the drive status, reveals that the Drive is logically offline within the STK.

 

STK Admin Guide ACSLS

https://docs.oracle.com/cd/E19775-01/802UpAdm/802UpAdm.pdf

 

 

ACSSA> q drive all

2016-01-14 14:35:23               Drive Status

Identifier   State           Status      Volume               Type

   0, 0, 1, 3 online          in use      N02111               HP-LTO5

   0, 0, 1, 6 online          in use                           HP-LTO5

   0, 0, 1, 9 online          in use      N01502               HP-LTO5

   0, 0, 1,12 online          in use      N01376               HP-LTO5

   0, 1, 1, 3 online          in use      N02107               HP-LTO5

   0, 1, 1, 6 offline         available                        HP-LTO5

   0, 1, 1, 9 online          available                        HP-LTO5

   0, 1, 1,12 online          in use      N02260               HP-LTO5

  

ACSSA> vary drive 0,1,1,6 online

2016-01-14 14:52:26 52988    drive   0, 1, 1, 6: Online.

Vary: drive   0, 1, 1, 6 varied online.

 

ACSSA> q drive all

2016-01-14 14:52:35               Drive Status

Identifier   State           Status      Volume               Type

   0, 1, 1, 6 online          available                        HP-LTO5

     

Vary Online the Drive resolved the problem.

 

 

 

 

 

What we learn from this case:

  • How to verify a Drive Config from MSM point of view
  • How to identify the Code throwing the Error and resolve its Description
  • What the lockInfo_<libname> is for and how to read it
  • List Drive Status on a ACSLS

 



This page was generated by the BrainKeeper Enterprise Wiki, © 2018