SNFS Logs

The SNFS logging can be found in three different areas.

The cvlogs

The cvlogs contain messages from the fsm processes. They can be found at the following location.

Linux/Unix: /usr/cvfs/data/<fsname>/log/cvlog*

Windows: C:\Program Files\StorNext File System\data\<fsname>\log\cvlog*

Look in the cvlogs for error messages or disk latency messages. High disk latency can affect the overall performance of SNFS.

The nssdbg.log

The nssdbg.log file contains messages from the fsmpm process. This file can be found in the following location.

Linux/Unix: /usr/cvfs/debug/nssdbg.out

Windows: C:\Program Files\StorNext File System\debug\nssdbg.out

disk discovery lists - when the fsmpm scans disks, a listing of disks will be printed. If fsm processes are not starting or clients are not mounting, verify all disks are seen by the fsmpm and that all labels are as expected.
fsm start and registration messages
Activation voting activity

QuStats

The qustats are are measuring overall metadata statistics, physical I/O, VOP statistics and client specific VOP statistics.

The overall metadata statistics include journal, thread (not currently available) and cache information. All of these can be affected by changing the configuration parameters for a file system. Examples are increasing the journal size, increasing the thread pool (not currently available) and increasing cache sizes.

The physical I/O statistics show number and speed of disk I/O. Poor numbers can indicate hardware problems or under capacity.

The VOP statistics show what clients are doing, which can show where work flow changes may improve performance.

The Client specific VOP statistics can show which clients are generating the VOP requests.

Key Stat Tables

Journal
- Waits - Increase size of journal
Thread pool usage -Not currently available. Used to determine if pool sizes would benefit from increase.
Cache Stats
- Buffer Cache (Buf Hits/Buf Misses) - If hit rate is low, size may need to be increased. The number of hits and misses are reported.
- Inode Cache (ICa Hits/ICa Misses) - If hit rate is low, size may need to be increased. The number of hits and misses are reported.
PhysIO Stats - Physical metadata I/O statistics
- Read/Write
  - max - maximum time to service a metadata read request
  - sysmax - maximum time to complete read system call
  - average - average time to service a metadata read request
VOP Stats - File and Directory operations
- Create, Remove, Mkdir, Rmdir
  - Cnt - count of operations in an hour
- Open, Close
  - Cnt - count of operations in an hour
- ReadDir - these are heavyweight, should be minimized where possible
  - Cnt - count of operations in an hour
- Extent
- Get/Set Attr
- Client VOP Stats
  - Per client VOP stats are available to determine which client may be putting a load on the MDC
- DMIG events
  - Stores and Retrieves

There are tools that can be used to parse Qustat data. Detail can be found in the Qustat information wiki

Storage Manager Logs

System logs

Linux: /var/log/messages

Windows: Look at the Event Viewer

Watch for both system related errors and StorNext related errors. For example;

Storage hardware errors like I/O errors and SCSI sense messages indicate that the HBA, SAN and Storage should be inspected to ensure that everything is stable. Just because there are no obvious errors in the array logs or switch logs doens't mean everything is Ok.
Networking error such as network disconnects or timeouts may indicate that there is some connectivity problems. Network settings should be checked. If bonded NICs are used verifiy that all the neccessary connectivity is present and the switch is capable and configured for the bonding. NICs, Switches and cables should be verified.
Watch for indications of low memory or system disk events

Do we have a tool to record the metadata requests and operations from StorNext clients at MDC server side? We know this will affect the performance a little, but it can help the end user to audit their system operations. SNFS file system is a sharing file system. Some customrs complained they found some files lost, but they didn't know which SAN clients did this deleting operations.

Note by Harvey Zeng on 05/22/2013 05:50 PM

I wrote a simple tool that looks for journal waits that does nothing more than tell you that they happened or not. Here's an output sample. Maybe this is the amount of info desired? Hints to go look in depth or not? It does assume the impact of journal waits is understood.

# journal_waits.pl --qdir=/usr/cvfs/qustats/FSM/lmccvfsck/lmc-vg.mdh.quantum.com/

Opened /usr/cvfs/qustats/FSM/lmccvfsck/lmc-vg.mdh.quantum.com/

81 files inspected with 0 containing journal waits

Note by Laurie Costello on 05/22/2013 05:35 PM

Can we provide a tool to filiter these logs to find any performance issue or some problems? This tool can be helped to analyze these logs.

Note by Harvey Zeng on 04/24/2013 10:00 AM