SR3587672 Fragmentation causing OpHangLimitSecs Panic after File truncation

SR Information: 3587672 Keygene

Problem Description: File Fragmentation causing OpHangLimitSecs Panic after File deletion

Product / Software Version:

MDC:

SNFS 4.3.3

RHLE 6.3 Kernel 2.6.32-279.el6.x86_64

Overview

FSM panic’ed twice after deleting a heavy fragmented file due to OpHangLimitSecs

OpHangLimit

The purpose of this is to terminate an fsm that is stuck so that all users don't end up waiting for a stuck operation to complete.

Think of "stuck" as being a software bug that causes a deadlock between threads. The filesystem has to restart, but clients are reconnected and usually don't notice unless they are sensitive to the timing. If a client is just doing i/o and doesn't need any space allocated for a file, it doesn't even need to talk to the fsm and thus there is no impact. (That's not a blanket statement - it could need more information from the server about currently allocated space, but its generally true.)

The operation in the message is what was detected as hung. Its something like VOP_VopDirAttrV2. This can be a hint that fragmentation is a problem but it could be something entirely different, like a real bug. For example, if the operation is a remove then we could be taking a long time to traverse all the extents. The problem can't be diagnosed from the panic message.

The best info to obtain to diagnose these is a stack trace from the corefile. Have the customer run gdb against the corefile and get the stack traces.

Example:

script gdb.output

gdb fsm corefile

> thread apply all bt

Exit

The file gdb.output now contains backtraces for all threads.

Symptoms & Identifying the problem

## 1 ## Log Review:

/var/log/messages

Sep 11 02:16:07 snmdc1 fsm[64216]: StorNext FSS 'snfs1[0]': PANIC: /usr/cvfs/bin/fsm "OpHangLimitSecs exceeded VOP-Class-0 Type-9 Subtype-77 ticks Conn[4] Thread-0x7f8ec4d2f700

We get the following information from this Panic message:

Filesystem Name: snfs1

What type of message caused the error:

VOP-Class-0 == MsgClassFSM Source: Class

Type-9 == MsgTypeFileOp Source: Type

Subtype-7== VopSubTypeGetResyncAttrV2 Source: Sub-Type

The Thread which was stuck: 0x7f8ec4d2f700

## 2 ## Troubleshooting:

1.) Identifying the stuck Thread

(gdb) thread apply all bt full

Thread 47 (Thread 0x7f8ec4d2f700 (LWP 64292)):

2.) Investigate on stuck Thread

(gdb) thread 47

[Switching to thread 47 (Thread 0x7f8ec4d2f700 (LWP 64292))]#0 0x0000003c1ca0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

(gdb) bt

#0 0x0000003c1ca0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

#1 0x00000000004bc38d in rdwrlk_rlock (rdwrp=0x7f8d7ac784e0) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsmlib/rdwrlk.c:177

#2 0x00000000004641e5 in inodelock (ip=<value optimized out>, locktype=<value optimized out>) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inode.c:3913

#3 0x0000000000468011 in inodeget (inode=422212473391389, ipp=0x7f8ec4d2ed68, locktype=1, freeflag=0, error=0x7f8ec4d2e24c)

at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inode.c:4425

#4 0x0000000000469b07 in _Inodelookup (tp=0x0, inode=422212473391389, ipp=0x7f8ec4d2ed68, locktype=1, return_any_pending=0)

at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inode.c:7163

#5 0x0000000000476bfe in VopGetResyncAttr (pmsgq=0x7f8d580018d0, version=2) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inodeops.c:1529

#6 0x0000000000495956 in VopsRequest (pmsgq=0x7f8d580018d0) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/vops.c:578

#7 0x000000000047ae0e in MsgDecoderClassFSM (arg=0x7f8d580018d0) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/msgdecode.c:352

#8 0x000000000047d739 in queue_thread (arg=0x1393f20) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/queues.c:547

#9 0x000000000047dde8 in __starter (arg=0x136da00) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/include/threads.h:257

#10 0x0000003c1ca07851 in start_thread () from /lib64/libpthread.so.0

#11 0x0000003c1c6e767d in clone () from /lib64/libc.so.6

Stack Frames:

Frame #0 - Wait on a condition

Frame #1 - /* rdwrlk_wlock -> Set a writer lock */

Frame #2 - /* inodelock -> Lock the in-core inode

Frame #3 - /* inodeget -> Find an inode in the in-core table

Frame #4 – /* _Inodelookup -> Find the inode based on the supplied number

Frame #5 - /* VopGetResyncAttr -> Return the attributes of an inode for client resynchronization

Frame #6 - /* VopsRequest -> Handle a VOP request

Note: The frame for the function in which execution is actually occurring is called the innermost frame. This is the most recently created of all the stack frames that still exist. GDB assigns numbers to all existing stack frames, starting with zero for the innermost frame, one for the frame that called it, and so on upward. Each frame is the data associated with one call to one function

VopsRequest() has a Client-Reference, so print it.

(gdb) frame 6

(gdb) print pclient->client_hostname

$15 = 0x7f8d2c000b30 "192.168.40.2"

Vops.c

VopsRequest(msgq_t* pmsgq){

int idx;

client_t* pclient = NULL;

VopGetResyncAttr() - Return the attributes of an inode for client resynchronization

àIndicating there is a reference to this File in the Clients Cache and its attributes need to be resynced.

Let’s see what inode we are requesting and resolve its Filename

(gdb) frame 3

#3 0x0000000000468011 in inodeget (inode=422212473391389, ipp=0x7f8ec4d2ed68, locktype=1, freeflag=0, error=0x7f8ec4d2e24c)

at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inode.c:4425

(gdb) p ip*

.. or straight ..

(gdb) p ip->i_idinode.u.xs._idi_xattr

$38 = “..003(RPL\000\001\000\000\367\206\f\234RPW_Cotton_assembly_14040118.tar", '\000' <repeats 207 times>

Who held’s the writer lock for this inode

(gdb) frame 1

#1 0x00000000004bc38d in rdwrlk_rlock (rdwrp=0x7f8d7ac784e0) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsmlib/rdwrlk.c:177

177 /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsmlib/rdwrlk.c

(gdb) print/x rdwrp->writer_id

$43 = 0x7f8ec4ef6700

Note: The thread is being used as WriterID

Looking into our full backtrace, we see it’s à Thread 10 (Thread 0x7f8ec4ef6700 (LWP 64285)):

Thread-47 is waiting for Thread 10 to release the write lock on this inode.

4.) Investigate on Thread 10 with Writer Lock

(gdb) thread 10

[Switching to thread 10 (Thread 0x7f8ec4ef6700 (LWP 64285))]#0 0x0000003c1c6ab91d in nanosleep ()

(gdb) bt

#0 0x0000003c1c6ab91d in nanosleep () from /lib64/libc.so.6

#1 0x0000003c1c6ab790 in sleep () from /lib64/libc.so.6

#2 0x0000000000469ae3 in _Inodelookup (tp=0x1407770, inode=1125904498046133, ipp=0x7f8ec4ef5b98, locktype=2, return_any_pending=0)

at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inode.c:7156

#3 0x000000000046d8d5 in Inode_punch_hole (inode=422212473391389, ip=0x7f8d7ac78498, frblockstart=0, frblockend=18446744073709551615, totfreed=0x7f8ec4ef5d68, f_qd=0x7f8ec4ef5d30)

at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inode.c:11325

#4 0x0000000000475e4b in VopPunchHole_core (pmsgq=0x7f8d5ee67020, version=0) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inodeops.c:3121

#5 0x0000000000495ab0 in VopsRequest (pmsgq=0x7f8d5ee67020) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/vops.c:661

#6 0x000000000047ae0e in MsgDecoderClassFSM (arg=0x7f8d5ee67020) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/msgdecode.c:352

#7 0x000000000047d739 in queue_thread (arg=0x1393f20) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/queues.c:547

#8 0x000000000047dde8 in __starter (arg=0x1405860) at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/include/threads.h:257

#9 0x0000003c1ca07851 in start_thread () from /lib64/libpthread.so.0

(gdb) frame 5

(gdb) print pclient->client_hostname

$46 = 0x7f8d64070570 "192.168.40.1" à it’s the MDC itself

(gdb) frame 3

#3 0x000000000046d8d5 in Inode_punch_hole (inode=422212473391389, ip=0x7f8d7ac78498, frblockstart=0, frblockend=18446744073709551615, totfreed=0x7f8ec4ef5d68, f_qd=0x7f8ec4ef5d30)

at /scm/nightly/VM-0-RedHat60AS-26x86-64-SP0/sn/snfs/fsm/inode.c:11325

* Inode_punch_hole

* Remove all blocks in the range specified. The range is INCLUSIVE.

* Entry: Inode is WRITE LOCKED.

* Assumes caller UPDATES the main inode and WRITE RELEASES it.

* Return: 0 = all done

(gdb) p ip->i_inode_number

$93 = 422212473391389

(gdb) p ip->i_idinode.u.xs._idi_xattr

$96 = “..RPW_Cotton_assembly_14040118.tar”

The inode is write locked as long as extender inodes exists. We loop trough all extender for the given inode

do {

/* Check for an extender inode - if any */

if (tmp_ip->i_idinode.idi_extents[last_extent].idiext_flags ==

ExtentFlagExtender) {

/* This inode does have an extender - read it in */

more = TRUE;

tcinode = tmp_ip->i_idinode.idi_extents[last_extent].idiext_base;

cinode = tcinode;

if (Inodelookup(tp, cinode, &tmp_ip, INODEWRITE) < 0) {}

} while (more == TRUE);

(gdb) print cinode

$7 = 1125904498046133

(gdb) frame 2

#2 0x0000000000469ae3 in _Inodelookup (tp=0x1407770, inode=1125904498046133,

ipp=0x7f8ec4ef5b98, locktype=2, return_any_pending=0)

Conclusion:

Thread 10 was holding the WRITE lock for inode 422212473391389 while walking through all Extender Inodes removing blocks
Thread 47 waiting to get the WRITE lock for inode 422212473391389 exceeding the OpHangLimit and panic

5. ) Re-Review Snapshot with the new information gathered

# ls -lisa RPW_Cotton_assembly_14040118.tar

8325405 0 -r-------- 1 root root 1937587148800 Apr 30 16:57 RPW_Cotton_assembly_14040118.tar

TSM Tac Log

Sep 10 14:35:04 snmdc1.keygene.local sntsm fs_cpyresp[52296]: E1202(9)<1325321218>:mda3rsp1798: FILEnm: NA key: 5496 ino: 8325405 gen: 3 - vsn: 001 retrieved due to read event

Sep 10 14:35:44 snmdc1.keygene.local sntsm fs_space[52305]: E1202(9)<00000>:mda1spc2500: Truncation incomplete for fs: /stornext/snfs1 dev_key: 1 affinity: n/a (97% full)

Sep 11 02:06:14 snmdc1.keygene.local sntsm fsrmdiskcopy[53631]: <1325423953> USER root: COMMAND: fsrmdiskcopy RPW_Cotton_assembly_14040118.tar

We can see the file had no blocks on disk and has a size of

~2TB. The File was manually retrieved and afterwards truncated by the user around the time of the panic. The Filesystem fill-level was around 97% that time and we can assume that this was causing heavy file fragmentation.

Note: We masks out the upper 32-bits when passing SNFS inode numbers over to Linux.

In SN5-metadata the inode numbers visible in ls will now include all bits of the inode number.

(gdb) thread 10

(gdb) frame 3

#3 0x000000000046d8d5 in Inode_punch_hole (inode=422212473391389,

ip=0x7f8d7ac78498, frblockstart=0, frblockend=18446744073709551615,

totfreed=0x7f8ec4ef5d68, f_qd=0x7f8ec4ef5d30)

(gdb) print/x inode

$10 = 0x18000007f091d

(gdb) print/x 8325405

$11 = 0x7f091d

Reference:

Bug 38389 - expose 64 bit inode numbers to user space

Bug 27284 - Fix up 64-bit inode support on Linux clients

Resolutions/workarounds/fixes:

This works as designed!

Their long-term goal has to be the reduction of FreeSpace Fragmentation / File Fragmentation.

As a optional Workaround the OpHangLimitSecs can be adjusted within the Filesystems configuration File.

What we learn from this case:

- How to identify the stuck thread

- How to identify the Client IP Address from a VopRequest

- How to identify the Filename of the inode within inodeget()

- Pre-SN5, we masks out the upper 32-bits when passing SNFS inode numbers over to Linux

- See KB Article: Severely fragmented file systems can degrade performance and trigger OpHangLimitSecs panics