EDLM Troubleshooting

Michael Richter: EDLM Troubleshooting V0.3.9 

 

Scalar i6000 Scan Policies

 

Enable External Application (StorNext)

– Library checks “suspect count” on dismount

– Scan performed if “suspect count” is set and above treshold

 

StorNext threshold

- Only scans it once at the threshold (MEDIA_SUSPECT_THRESHOLD)

- RAS Ticket generated on the library

– If scan fails “medcopy” is started

- Write protect the media on StorNext

- RAS Ticket generated on the library

 

Tape Alert thresholds

– 3 is the default but configurable

 

Time interval

– Day increments

– Be careful it’s possible to thrash the system

 

Scan on import

- every media - even when its known and vaulted - which is entered into the library partition will be scanned

 

 

 

EDLM Setup i6k (example: CGG Veritas)

EDLM i6k (example: CGG Setup )

 

 

 

 

 

 

EDLM Snapi Workflow

 

EDLM

the EDLM Scan is transparent to StorNext. the i6k library is supposed to report the media presence in its home slot while it scans the media

 

SNAPI i6k PlugIn

- test system status to test connectivity

- check for suspect counts of media on dismount

- check for a vault associated with a media export

- initiate a copy operation and precede it with a media write protect

 

Note: we do use pass-through commands to determine suspect count and associated vaults, and use it for the copy operation so we can copy current and previous revisions.

 

 

 

 

EDLM SNAPI Commands & List of the interfaces/data fields we are using in the EDLM and ActiveVault features for the tape libraries

 

Enabling SNAPI logging won’t show you anything unless there are errors, so the best bet is to capture the commands as they come into SN.

You can get the command and status from the log file I listed in the table - you won’t really know if it came from a SNAPI call or not.

 

 

 

SNAPI command

SN command

SN log for command

& status

comments

CopyMedia

fsmedcopy –r -b

TSM/logs/history/hist_01

Supply a volser and block on this call.

 

Not sure if this is still used.  At one point there was discussion about using passthru command to run fsmedcopy –r –a  -b so all files get copied instead of just the active ones.

GetMediaStatus

vsmedqry

fsmedinfo

MSM/logs/history/hist_01

TSM/logs/history/hist_01

We supply a volser and get back a MediaInfo instance. 

 

We call getSuspectCount() and getWriteProtected() on this instance after an unload and prior to a CopyMedia command respectively.

 

GetSystemStatus

cat /usr/adic/.version

TSM_control status

vsping

database_control status

SRVCLOG_control status

cvadmin –e select <fsname>

none

Used to get the SystemInfo instance.

 

We call getSystemVersion() and getState( SystemInfo::system ) on this instance to validate compatibility and online/offline state.

PassThru

vsmedqry

MSM/logs/history/hist_01

 

Get media status snapi command does not return the pending archive field which the plugin needs to determine the target archive for the move hence the use of it with the pass thru command.

showsysparm

none

Used with the 'showsysparm' command to get FS_MAX_ACTIVE_TAPECOPIES and MEDIA_SUSPECT_THRESHOLD values.

Used with the 'vsmedqry' command and a volser. Parse the "Pending Archive" out of the result string.

SetMediaState

fschmedstate

TSM/logs/history/hist_01

Used with the volser of a cartridge and SetMediaState::PROTECT as the state prior to a CopyMedia command.

 

 

 

SNAPI Config / Install

 

StorNext

 

/usr/adic/SNAPI/config/snapi.cfg


snapi.cfg file needs an entry for each MDC - so both are in it - this has to be done on both MDCs.

  

---------------------- example--------------------------

 

<?xml version="1.0"?>

<SNAPI_CONFIG>

<PARAMETER name="serverName" value="10.163.8.222"/>

<PARAMETER name="serverName" value="10.163.8.223"/>

<PARAMETER name="serverPort" value="61776"/>

<PARAMETER name="clientTimeOut" value="1800"/>

</SNAPI_CONFIG>

 

-----------------------example-------------------------

 

Note: Snapi has to be restarted

 

SNAPI_control stop

SNAPI_control start

 

Note: Without both IPs in the config file StorNext will not reissue the request to the other server if appropriate.

Please ensure that SNAPI is running on both MDCs as well.

 

SNAPI_control status


See attached SNAPI 2.03 Guide, for install and configuration guidelines


Scalar i6k
see UserGuide: Chapter 09: Configuring EDLM 
see UserGuide: Chapter 11: Configuring Access to StorNext 

 

 

 

 

Scalar i6k Troubleshooting ( Example CGG Veritas )

 

1. General Log Analysis for EDLM activity

 

TDM_Session.log

serial            partition    mount_time         unmount_time                  mb_read  mb_written                    motion_time                 volser

HU1129HE5G  (EDLM)               2                     2013-04-26 12:59:56+00              2013-04-26 13:01:02+00              116               4                     0                     ABC755L5                        

HU1131HRAA                 1                     2013-04-26 14:22:07+00              2013-04-26 14:26:22+00              116               4                     622               ABC755L5               

 

Michael: wokr/description needed

 

lmserver.log

2013-04-26 12:49:30,297  INFO (SnmpMediaBuilder.java:332) User manually inserted media: ABC755L5 at: 48

2013-04-26 12:59:56,858  INFO (ManagementTrapHandler.java:624) Tape Drive mounted: [1, 3, 1, 11, 1, 1] SN: HU1129HE5G

2013-04-26 13:01:07,059  INFO (ManagementTrapHandler.java:650) Tape Drive dismounted: [1, 3, 1, 11, 1, 1] SN: HU1129HE5G

2013-04-26 14:22:08,069  INFO (ManagementTrapHandler.java:624) Tape Drive mounted: [1, 1, 1, 12, 1, 1] SN: HU1131HRAA

 

Michael: description needed about import/expot in regards to EDLM Scan on Import

 

 

 

MeDIATables.log

 

media.session

  id  |       start_date       |        end_date        | num_incomplete | num_good | num_bad | num_suspect | num_unsupported | test_state | continue_on_error

------+------------------------+------------------------+----------------+----------+---------+-------------+-----------------+------------+-------------------

2388 | 2013-04-26 12:57:51+00 | 2013-04-26 13:01:03+00 |              0 |        1 |       0 |           0 |               0 |          1 | f

 

 media.sessionresults

  id  | aisle_num | frame_num | rack_num | section_num | column_num | row_num |  voltag  | session_id | media_id |  drive_id  |     last_run_date      | test_state | test_type | test_result | static_test_status | static_test_error_code | dynamic_test_status | dynamic_test_error_code | priority

------+-----------+-----------+----------+-------------+------------+---------+----------+------------+----------+------------+------------------------+------------+-----------+-------------+--------------------+------------------------+---------------------+-------------------------+----------

3129 |         0 |         6 |        1 |           4 |          1 |       4 | ABC755L5 |       2388 |     3080 | HU1129HE5G | 2013-04-26 13:00:28+00 |          3 |         1 |           1 |                512 |                    768 |                1029 |                    1281 |        2

 

test_state:3 = COMPLETE

test_result:1 = GOOD

static_test_status: 512 = CM_SCAN_COMPLETE,

static_test_error_code: 768 = CM_SCAN_IS_GOOD

 

 

Michael: description needed, correct place for mapping information?

 

//*

 

 

test_type:

 

1 = SIMPLE

2 = NORMAL

3 = FULL

 

test_state:

 

1 = PENDING

2 = IN_PROGRESS

3 = COMPLETE

4 = STOPPED

5 = PAUSED

 

test_result:

 

0 = NOT_COMPLETED

1 = GOOD

2 = UNSUPPORTED

3 = SUSPECT

4 = BAD

 

These are the mappings for the numbers in static_test_status, static_test_error_code, dynamic_test_status and dynamic_test_error_code:

 

1 = TA_READ_WARNING,

3 = TA_HARD_ERROR,

4 = TA_MEDIA,

5 = TA_READ_FAILURE,

10 = TA_NO_REMOVAL,

11 = TA_CLEANING_MEDIA,

12 = TA_UNSUPPORTED_FORMAT,

13 = TA_RECOVERABLE_CARTRIDGE_FAILURE,

15 = TA_CM_FAILURE,

16 = TA_FORCED_EJECT,

18 = TA_TAPE_DIRECTORY_CORRUPTED,

19 = TA_NEARING_MEDIA_LIFE,

20 = TA_CLEANING_REQUESTED,

48 = TA_DIRECTORY_INVALID,

55 = TA_LOAD_FAILURE,

56 = TA_UNRECOVERABLE_UNLOAD_FAILURE,

59 = TA_WORM_INTEGRITY_FAILURE,

60 = TA_WORM_OVERWRITE_ATTEMPTED,

512 = CM_SCAN_COMPLETE,

513 = CM_SCAN_PAUSED,

514 = CM_SCAN_PENDING,

515 = CM_SCAN_NOT_RUN,

516 = CM_SCAN_IN_PROGRESS,

768 = CM_SCAN_IS_GOOD,

769 = CM_SCAN_NA,

770 = CM_SCAN_FAILED_TO_RECIEVE_CM_DATA,

771 = CM_SCAN_CM_HARDWARE_FAILURE,

772 = CM_SCAN_THREAD_COUNT_THRESHOLD_EXCEEDED,

773 = CM_SCAN_WRITE_PASS_THRESHOLD_EXCEEDED,

774 = CM_SCAN_UNCORRECTED_ERRORS,

775 = CM_SCAN_UNABLE_TO_LOAD,

776 = CM_SCAN_UNABLE_TO_UNLOAD,

777 = CM_SCAN_NOT_PRESENT,

778 = CM_SCAN_NO_COMPATIBLE_DRIVE,

1024 = SCAN_COMPLETE,

1025 = SCAN_PAUSED,

1026 = SCAN_PENDING,

1027 = SCAN_NOT_RUN,

1028 = SCAN_IN_PROGRESS,

1029 = SCAN_NOT_CONFIGURED,

1030 = SCAN_STOPPED,

1280 = SCAN_IS_GOOD,

1281 = SCAN_NA,

1282 = SCAN_FAILED_IO_BLADE_COMM,

1283 = SCAN_FAILED_TO_RECIEVE_SCAN_DATA,

1284 = SCAN_UNEXPECTED_EOD,

1285 = SCAN_UNFORMATTED_TAPE,

1286 = SCAN_FAILED_TO_READ_TAPE_DATA,

1287 = SCAN_UNRECOVERED_READ_ERRORS,

1288 = SCAN_PLACE_HOLDER,

1289 = SCAN_CORRUPT_DATA_FORMAT,

1296 = SCAN_MECHANICAL_FAILURE,

1297 = SCAN_SEVERELY_DEGRAGED,

1298 = SCAN_UNABLE_TO_LOAD,

1299 = SCAN_UNABLE_TO_UNLOAD,

1300 = SCAN_CLEANING_CARTRIDGE,

1301 = SCAN_CM_FAULT,

1302 = SCAN_UNKNOWN_MEDIA_TYPE,

1303 = SCAN_SCAN_ABORTED,

1304 = SCAN_MEDIA_NOT_PRESENT,

1305 = SCAN_ENCRYPTED_MEDIA,

1312 = SCAN_BLANK_MEDIA,

1313 = SCAN_BLOCK_SIZE_EXCEEDS_MAX,

1314 = SCAN_FUP_TAPE,

1315 = SCAN_DRIVE_CM_READ_FAIL,

     

 

 

 

*//

 

 

 

 

 

dbDumpOutput

 

Michael: work/description needed

 

 

2. EDLM\SNAPI related RAS Tickets

 

 Error Code Lookup Tool (ECLT): https://qsweb.quantum.com/users/OPS_site/errorCodeIntro.php

 

 

 

 

 

 

 

 

StorNext Troubleshooting

 

 

 

1) SNAPI  - force an EDLM scan from within SN by setting high suspect count and mark flag

 

Example:

 

 echo "use tmdb; update mediadir set Mark='X',Susp='4' where mediaid='AAW914';" | mysql

 

Fsmount <mediaid>

Fsdismount –m <mediaid>

 

2.) SNAPI - force the AEL to send a SNAPI request to initiate a fsmedcopy

 

The default threshold for Thread Count Threshold is 9900.

Changing the EDLM Thread Count threashold to 1, will allow to mark the media suspect during EDLM Scan.

 

 

1. ssh to library IP 

2. login as ilinkacc (password is the admin password) 

3. su (password is dallas) 

 

This value can be adjusted by: 

psql -Uilinkacc i2kdb -c "INSERT INTO media.settings VALUES(DEFAULT, 'MeDIA_ThreadCountThreshold', <n> )" 

 

Update it with: 

psql -Uilinkacc i2kdb -c "UPDATE media.settings SET value = <n> WHERE name = 'MeDIA_ThreadCountThreshold'" 

 

Delete it and return to the default: 

psql -Uilinkacc i2kdb -c "DELETE FROM media.settings WHERE name = 'MeDIA_ThreadCountThreshold'" 

 

Where <n> is any signed 32 bit value. If you set it to -1, all tapes would fail; 1 would fail any tape that has been threaded at least once, etc. 

 

For these to take effect: 

reboot the library after changing this or just: 

/etc/init.d/tmmd restart 

 

Checking Current Thread Count of a Tape 

 

To determine the current thread count of a tape that has been loaded in a drive at least once in the library (003175L3 in this case, note the % at the end): 

psql -Uilinkacc i2kdb -c "SELECT vol_tag, thread_count from library.mediastats WHERE vol_tag LIKE '003175L3%'" 


 

 

 

Use Cases / Service Request

 

 

SR 1560792 - CGG Vertias - QFE replaced the MCB on CGGVertias Library and restored a old config from April 2013 then upgrade MCB from i8.1 -> i10.3  

 

Application Client Plug-In is inoperable

 

 "Application Client Plug-In is inoperable"

    R2, H1, Degraded: D2350, RQ=0, By TapeDriveMgr @ Tue May 14 12:34:53 2013

    K/C/Q = "No Sense: No Additional Sense Information", SN = ""

    Tag: 01_35_07_00_00000000 Error Modifier:0x0

    Desc"Failed to load the EDLM extension: snapi-2.0.1"

 

 

 

Since we didnt perform any changes on the StorNext side, it had to be a issue on the i6k.

The i6k was showing the correct configuration information for the SNAPI Plug in and EDLM Config.

The ticket itself is pretty clear, it can not load the EDLM Extention\PlugIn.

Also a quick check on the i6k repair pages suggest to reload and re-configure the PlugIn.

 

This seems to be a known Bug (  Bug 33197 - EDLM Extension load failure ) , that the save/restore configuration doesnt store the extention/plugin, so it has to be re-installed.

 

 

 

 

 SR 1554934 - CGGVERITAS - EDLM Scan on Import causes TSM/MSM to run into inconsistency

 

Descr: Enabling Scan on Import and moving tapes from vault to Library, leaves SNSM with inconsistency for medias on the EDLM Scan List

 

Analysis: 

 

 

i6k lmserver.log - Medium physic. inserted into the library - 

2013-04-26 12:49:30,297  INFO (SnmpMediaBuilder.java:332) User manually inserted media: ABC755L5 at: 48

 

 

 

i6k TDMSession.log - reflects the EDLM Scan Intervall

serial            partition    mount_time         unmount_time                  mb_read  mb_written                    motion_time                 volser

HU1129HE5G  (EDLM)               2                     2013-04-26 12:59:56+00              2013-04-26 13:01:02+00              116               4                     0                     ABC755L5         

 

 

StorNext MSM Tac Log

Apr 26 12:55:43 redmeta01.uk.cgg.com snmsm ArcDisp[13444]: E0999(7)<1111641345>:AD_ArcEnterCmd574:  ABC755: action mount move 

Apr 26 12:55:43 redmeta01.uk.cgg.com snmsm XdiAMTask_3[13449]: E0462(6)<1111635229>:AMTaskFuncs1410:  Archive i6k_archive1: Received Mount command (Media ID: ABC755,  Drive List) 

Apr 26 12:55:43 redmeta01.uk.cgg.com snmsm XdiAMTask_3[13449]: E0999(7)<1111635229>:MountAMCmd1645:  MountAMCmd::SelectResources  DriveID:18  MediaID:ABC755 

Apr 26 12:55:43 redmeta01.uk.cgg.com snmsm XdiAMTask_3[13449]: E0416(6)<1111635229>:MountAMCmd2546:  Archive i6k_archive1: Medium: ABC755 and Drive: 18 selected to be mounted 

Apr 26 12:55:53 redmeta01.uk.cgg.com snmsm XdiAMTask_3[13449]: E0714(4)<1111635229>:XdiPrimitive1958:  Archive i6k_archive1: MOUNT of Medium ABC755 for Drive Slot 00000,00000,00012,00261 was Failed (304 - medium not found in archive)

Apr 26 12:55:53 redmeta01.uk.cgg.com snmsm XdiAMTask_3[13449]: E0999(7)<1111635229>:ArchMedia457:  ArchMedia::ArchMedia(ABC755) 'Database'

Apr 26 12:55:53 redmeta01.uk.cgg.com snmsm XdiAMTask_3[13449]: E0104(3)<1111635229>:MountAMCmd3046:  Archive i6k_archive1: INCONSISTENCY EXISTS: Mount - Assigned Medium: ABC755 not found - Error: 40 

Apr 26 12:55:53 redmeta01.uk.cgg.com snmsm XdiAMTask_3[13449]: E0138(6)<1111635229>:MountAMCmd3586:  Archive i6k_archive1: Mount command failed for (Media ID: ABC755, Drive List) - Error: media not available for use

 

                

 

You can see the media was imported into the library partion and shortly after queued for EDLM Scan on Import.

In theory the i6k is expected to feedback the home slot for the media during a EDLM scan making it transparent for MSM and delays the mount request until the scan finished.

In this scenario the i6k didn't feedback the home slot to the Application and the application lost tracking of the tape ending up in a inconsistency

 

This has been escalated & adressed in Automation Bug 48016 - Import quick scan usurps SN media

 

 

SR SR3355462 - Skyvision - forcing EDLM Scan via StorNext doesnt work

 

Description: Support tried to force a EDLM Scan setting the suspect count to the treshold, which did not trigger a EDLM Scan

 

Analysis:

The i6k library had the "Media Identifier" set to "pass through" rather than "disabled".

This caused EDLM to send the full barcode on the command to SNAPI. This resulted in this:

fsmedinfo A00007L6

SNSM had no idea about this media so the request failed. 

Modified the i6k configuration so "Media Identifier" was set to "disabled" - with stopped SNSM before mod. the library -

Once everything was restarted the test worked and the media was mounted in the EDLM drive and tested.

 

 

 

 

 

EDLM Related Bugs

 

Bug 41930 - SNAPI: GetSystemStatus returns SUBFAILURE if the fsmlist file has any commented out lines in it 

Bug 48016 - Import quick scan usurps SN media

Bug 33197 - EDLM Extension load failure


 

 

 

Bug 37274 - provide interface for EDLM to know which barcode format to use.

Bug 63132 Successfull EDLM Scan should reset Media suspect count 

 

Bug 56435 EDLM scan can trigger multiple unwanted media check

 

Edit:

17.05.2017 - added 2 more SNMS Bugs realted to EDLM

 

ToDo

- SNAPI PlugIn i6k Config/install + Test

- Test Snapi Communication between Library and SN

- restructure wiki page

 

Attachments
Title Last Updated Updated By
6-01375-07_SNAPI_2.0.3_Storage_Manager_API_Guide_RevA.pdf
01/06/2015 03:05 AM Michael Richter
SN42-TOI-EDLM-Slides.pdf
05/06/2013 04:57 AM Michael Richter


This page was generated by the BrainKeeper Enterprise Wiki, © 2018