StorNext File System Data Coherence

Applications can concurrently read and/or write the same file from different nodes. There are various ways to coordinate which region of a file each node is writing. For example, file locks can be used when I/O is done on the same file.

Prior to StorNext 6, when a file is open on multiple nodes with at least one writer, no buffer cache is used for the contents of the file. All I/O is done directly to or from the storage usually using DMA I/O. This method of providing coherence has some limitations since DMA or direct I/O has strict requirements.

Beginning with StorNext 6, I/O coherence is handled using “tokens.” The configuration variable, ioTokens, can be set to false to re-enable the DMA coherence model. The default is true which allows I/O to use the buffer cache on each node.

Note: All nodes accessing the file must be at StorNext 6 or above to use tokens.

When using DMA coherence, each I/O must be aligned on a “sector” boundary and the I/O size must be of a multiple of the sector size. In addition, the memory buffer used for the I/O must be aligned. If the I/O does not follow these requirements, StorNext must do read-modify-write operations to handle the user’s I/O request. In the worst case, the I/O can be split into three pieces:

One piece for the front
One piece for the middle (the body)
One piece for the tail

This is done so that the DMA can follow the size and alignment requirements.

For example, a write that is misaligned requires a read into an aligned buffer, a copy of the user’s data into that buffer, and the aligned the buffer to the storage.

If two nodes are doing writes on the same file at adjacent, but misaligned locations, it is possible for one node to overwrite all or part of the other node's write. This happens if the write occurs during the read-modify-write that is required to align the DMA I/O. The alignment could be covering the head or tail of the other node's range.

To avoid this problem, follow the three requirements below for all I/O:

The buffer passed on the read or write system call must be aligned using an API such as posix_memalign. The value passed as the alignment must be the PAGESIZE of the given machine, usually 4096 bytes.
The offset into the file must be done on a PAGESIZE boundary.
The length of the I/O must be a multiple of PAGESIZE.

Note: If the sector size of the storage in use is larger than PAGESIZE, then that size must be used instead.

For additional information, see the ioTokens variable information of the command, snfs_config(5), in the Man Pages Reference Guide.