StorNext RAS Message – File System Component [x] Not Responding
The message that a StorNext filesystem is not responding has two forms, which can be distinguished by the “SR Notes” section in the body of the message.
Form 1: <Reporting System> : fs <”file system name”> on host <FSS host system> not currently accessible (possibly stopped administratively). Thread <thread ID>pid 0, still trying. Ticket creation time: <date & time>
Form 2: <Reporting System> : fs <”file system name”> : Timeout while attempting to force data flush for file <”filename”> (inode <inode number>) for file system <”file system name”> on host <Problem system>. See cvlog for more details. Ticket creation time: <date & time>
The following discussion shows how to deal with each of these messages.
Mac Clients may incorrectly generate this message when they are rebooted. Apple has been informed of this issue. It should be addressed in a future release of the MacOS software.
If this message was not generated after a Mac client rebooted, it indicates that network communication was lost between the client and File System Manager (FSM) on the Metadata Controller (MDC).
- Was the loss of network communication due to the FSM being stopped administratively on the MDC? If so, restart the FSM. Communication should then be re-established.
- Is only one client affected?
- A single RAS/log message indicates that the issue most likely is with the one client. If only one client is affected, this isolates the problem to that client and its connections to the metadata network.
A client connects to the FSM through the address shown in the fsnameservers file, or if that file is not present, in the fsforeignservers file on the client. Use this address on the client to check connectivity with the FSM.
- If the problem does not appear to be an isolated issue on a single client, examine the MDC system log, and any other RAS/log messages.
- If more than one client is affected, but not all clients, the problem resides in part of the network infrastructure. To isolate the potential cause, determine which parts of the infrastructure are affected.
- If all clients are affected, the cause is most likely related to the MDC connections to the network or the FSM itself. Start by troubleshooting the MDCs. You can use the CLI commands cvadmin and top to check if the FSMs are active and are not overconsuming resources.
- If necessary, contact Quantum Support for assistance.
This RAS message was first introduced in StorNext version 4.7.0.
Mac Clients may incorrectly cause this message to be generated if they are allowed to go into sleep mode while file systems are mounted.
To prevent this, disable sleep mode on any Mac that mounts StorNext file systems. See this Apple support article:
Overview for Form 2
When an MDC has received a request from a client to access a file that is being accessed by another client, the following steps are followed:
- The MDC sends a request to the first client to flush its data buffer, so that the second client can access the file.
- After the client responds that its data has been flushed, the file is opened in shared mode for both clients.
- If the original client fails to respond after multiple attempts (~45 seconds in SN 4.7.x, ~150 seconds in SN5.x), this message is generated. It indicates that the client did not respond to the flush request, and that the second client will be allowed to open the file for writing. Note that this may cause data coherency issues, since the state of the first client is unknown.
- This message points to issues on the client system. Examine the following potential causes:
- The client is in a ‘hung’ condition where it cannot respond to external requests. Further troubleshooting will be necessary to determine the cause of the hung condition.
- The Ethernet connection to the metadata network is down for this client. A client connects to the FSM through the address shown in the fsnameservers file, or if that file is not present, in the fsforeignservers file on the client. Use this address on the client to check connectivity with the FSM.
- Examine the client logs, which may point to the cause of this issue.
- If further assistance is required, contact Quantum Support for troubleshooting assistance.