Deduplication and Truncation

Let's look again at the directory in the previous section that has the three files f.2m, f.4m, and g.4m. Using the Linux command ls -ls displays the following in the directory:

total 10240

2048 -rw-r--r-- 1 root root 2097152 Jan 26 14:22 f.2m

4096 -rw-r--r-- 1 root root 4194304 Jan 26 14:22 f.4m

4096 -rw-r--r-- 1 root root 4194304 Jan 26 14:23 g.4m

The first column on the left displays the total number of blocks (1024 bytes per block) contained in each file. The column before the date displays the file size in bytes.

StorNext can truncate files that have been deduplicated. By “truncate” we mean that the disk blocks for the file have been freed. If the deduplicated files shown above are truncated, the "ls -ls" command displays the following:

total 0

0 -rw-r--r-- 1 root root 2097152 Jan 26 14:22 f.2m

0 -rw-r--r-- 1 root root 4194304 Jan 26 14:22 f.4m

0 -rw-r--r-- 1 root root 4194304 Jan 26 14:23 g.4m

There are no blocks in any of the three files, although each file retains its correct size.

As an exercise, in the previous "ls -l" and "ls -ls" examples, what does the line that says "total some_number" signify?

When an application or command accesses any of the data in a truncated file, StorNext retrieves the data it needs from the blockpool. This may be the entire file for a small file. For a larger file, a portion of the file would be retrieved: a portion at least large enough to contain the file region required. If you read the entire file, the entire file will be retrieved.

Truncation provides the mechanism by which file system storage space may be reduced. When a file is truncated it takes no space in its file system, but space for its BLOBs is required in the blockpool. If we receive deduplication benefit (that is, if the same BLOB data occurs in more than one place,) then we have less space used in the blockpool than would be in the original file system.