HA Resets

After a suspected HA Reset, the first place to look is the /usr/cvfs/debug/smithlog file, which contains one-line time-stamped descriptions of probable causes for the reset.

There are three methods for producing an HA Reset:

  1. Expiration of an HA Monitor timer.
  2. Exit of the active HaShared FSM while the shared file system is mounted on the active MDC.
  3. Invocation of the command snhamgr force smith by a script or manually by an administrator. The smithlog file is written by the fsmpm process, so there would not be an entry in the file when an fsmpm exit results in an HA Reset.

Caution: It is not recommended to use the force smith command to administratively failover a system in a production environment. The preferred method to gracefully failover the primary system to its secondary node is to simply stop CVFS and restart it after the secondary node has become primary. For example, on the node that is primary run:

# service cvfs stop

Wait for the secondary to become primary, then run:

# service cvfs start