How to Gracefully Fail Over a StorNext HA Node

Care should be taken when trying to fail over from the current MDC in a HA pair, particularly in a StorNext appliance. In particular the use of SMITH is discouraged to avoid the possibility of filesystem or SNSM database corruption.

Note that failing parts of the configuration, eg simply stopping the HA filesystem, will result in a SMITH and should be avoided.

Preparation

First check that the alternate node is functioning and ready to take over by running snhamgr on the node you intend to fail over from :

# snhamgr status

LocalMode=default

LocalStatus=primary

RemoteMode=default

RemoteStatus=running

Also ensure that any resources that the alternate node requires are available and operational. (EG check disks with cvlabel and tape libraries and drives with fs_scsi).

You may also wish to open shell windows on both nodes and tail the system logs to monitor the failover in real time.

Failover

To initiate the failover simply stop the StorNext filesystem service :

# service cvfs stop

On completion of the stop the snhamgr should show the alternate system as the primary and the current node as stopped :

# snhamgr status

LocalMode=default

LocalStatus=stopped

RemoteMode=default

RemoteStatus=primary

Be sure to restart the service once the failover has completed to ensure that the local node is once again available to take over in the event of a failure of the new primary.

Example of Filesystem Service Successfully Stopping and Starting

# service cvfs stop

Initiating stop of StorNext SNAPI component

SNAPI software stopped.

Initiating stop of StorNext TSM component

FS0285 Tertiary Manager terminate requested.

FS0279 Tertiary Manager software successfully terminated.

FS0000 01 0001348744 /usr/adic/TSM/exec/fsconfig completed: Command Successful.

Initiating stop of StorNext MSM component

Initiating the Media Manager shutdown

Setup environment variables ok

Shutting down the Media Manager system processes ... Done

System processes shut down ok

Shutting down the Media Manager servers ... Done

Servers shut down ok

Shutting down the Media Manager process server ... Done

Process server shut down ok

The Media Manager shutdown completed

Initiating stop of StorNext PSE component

Initiating stop of StorNext SRVCLOG component

Stopping...

Stopping sla with pid: 2211

Stopping ala with pid: 2221

Initiating stop of StorNext mysql component

Stopping mysqld

mysqld stopped

Initiating stop of StorNext DSM component

Stopping blockpool succeeded [ OK ]

Terminating snpolicyd, this may take up to 300 seconds

Stopping snpolicyd succeeded [ OK ]

Unmounting SNFS filesystems [ OK ]

Stopping SNFS Daemons

Disabling vips

Running '/sbin/ifconfig bond0:ha down'

Stopping SNFS PortMapper

Waiting for FSMs to finish..

SNFS Stop [ OK ]

# service cvfs start

Initiating start of StorNext DSM component

Checking maintenance license...

- The maintenance license status is: Good [ OK ]

Initializing StorNext Filesystem (SNFS)

Loading SNFS modules

net.core.rmem_max = 1048576

Multipath enabled, waiting up to 500 seconds for multipath device creation

. [ OK ]

Starting /usr/cvfs/bin/fsmpm .........

net.core.rmem_max = 131071

Starting /usr/cvfs/bin/cvfsd ... [ OK ]

Mounting the shared file system: HA_shared

Waiting for primary

Waiting for CVFS mounts to complete [ OK ]

SNFS Initialized [ OK ]