SNFS: File System Not Coming Up Due To Numerous Failures

 

SR Information: SR 1471448
Product / Software Version: This issue occurred on a system with DXi 2.1.3 software. This issue may occur on DXi systems with other software versions.

Problem Description: Due to system corruption, a DXi experienced an fsm panic and SNFS did not properly come up.

PTR Related:

  • Bug 36545 Link will open in new window. - No fsm restart with 2 core's in 1 hour

 

OVERVIEW

Due to system corruption, a DXi experienced an fsm panic and the StorNext File System (SNFS) did not properly come up. This troubleshooting methodology includes the symptoms found on the customer's system and the procedures used to resolve this issue.

  


SYMPTOMS (IDENTIFYING THE PROBLEM)

The following symptoms were found on the customer's DXi system:

 

 

 

 


RESOLUTION (WORKAROUND OR FIX)

DXi software is designed to impede fsm from coming up after two failed attempts to bring vol0, ending with core dump after 1 hour. vol0 is the filesystem that is mounted under /snfs mount point.

 

  1. You can verify this using the fsmlist command as shown below:

 

 

  1. To resolve this issue, start and activate vol0 with the following commands under cvadmin:

snadmin> start vol0

Starting FSS locally.

 

snadmin> activate vol0

Activate FSM "vol0"

 

  1. Check the messages log file. If you receive the following error message, verify that that SNFS is still not accessible.


 

  1. To clear the counters, you can try to reboot the machine. If the problem persists, contact the SES team via escalation form and oracle OCR

Note: On the customer's system a reboot was unresponsive. After executing the command, the prompt indicated that services were still up and running. A SMITH command was forced using the following command:

 

#  /opt/DXi/util/qtm_cm_util --smith
 

  1. Once SNFS was back up and running, file system integrity was checked using the following procedure:
    1. Stop services: 

# service heartbeat stop
Stopping High-Availability services:
                                                           [  OK  ]

 

    1. Start SNFS:

# /etc/init.d/cvfs start
 
Initiating start of ADIC DSM component
Initializing StorNext Filesystem (SNFS)
Loading SNFS modules
net.core.rmem_max = 1048576
Starting /usr/cvfs/bin/fsmpm.
net.core.rmem_max = 1048575
Starting /usr/cvfs/bin/cvfsd...
Mounting SNFS filesystems
/dev/cvfsctl1_vol0 on /snfs type cvfs (rw,noatime,cachebufsize=256k,nfsopen=yes,timeout=600,sparse=yes)
SNFS Initialized                                           [  OK  ]
 

 

    1. Unmount SNFS:

# umount /snfs

 

    1. Under cvadmin, stop vol0

# /usr/cvfs/bin/cvadmin

snadmin (vol0) > stop vol0
Stop FSS "vol0"
 
FSS 'vol0' on localhost.localdomain stop initiated.
FSS 'vol0' stopped.
Select FSM "none"
 
snadmin> exit

 

    1. Run cvfsck without applying changes

# /usr/cvfs/bin/cvfsck -nv vol0

 

The output of the cvfsck can be found in /usr/cvfs/data/vol0/trace/

 

Contact the SES team if you need assistance reviewing the results of cvfsck -nv.

 

Note: If you are familiar with fixing file system issues, you can proceed with cvfsck -v.
 


 

 



This page was generated by the BrainKeeper Enterprise Wiki, © 2018