Real-time I/O
A process requests real-time (ungated) I/O by using the SNFS External API SetRtio call (F_SETRIO ioctl). A library function is included in the External API sample source code that provides all the required cross-platform handling.
As an example, assume that a video playback application requires a constant rate of 186 MB/sec to correctly display images without dropping any frames. The application gates itself; that is, it requests I/O at a rate to satisfy the requirements of correctly displaying an image. QOS provides a mechanism so other I/O requests do not perturb the real-time display.
In the following example, assume the I/O subsystem has been qualified at 216 MB/sec. The file system block size is 4k. The disk subsystem is actually a large RAID array that internally maps many drives to a single LUN. There are four LUNs in the stripe group; each LUN is optimized for a 1.5 MB transfer. This corresponds to the following in the fsm configuration file:
<stripeGroup index="1" name="MyStripeGroup" stripeBreadth="384" realTimeMB="216">
<disk index="0" diskLabel="CvfsDisk0" diskType="VideoDrive"/>
<disk index="0" diskLabel="CvfsDisk1" diskType="VideoDrive"/>
<disk index="0" diskLabel="CvfsDisk2" diskType="VideoDrive"/>
<disk index="0" diskLabel="CvfsDisk3" diskType="VideoDrive"/>
</stripeGroup>
[StripeGroup MyStripeGroup]
StripeBreadth 384
Node CvfsDisk0 0
Node CvfsDisk1 1
Node CvfsDisk2 2
Node CvfsDisk3 3
Rtmb 216
Also, assume there is only one stripe group for user data in the file system. As recommended by Quantum, there may be other stripe groups for metadata and journal that are not shown.
Initially, all stripe groups in the file system are in non-real-time mode. Clients make their requests directly to the I/O subsystem without any gating. In our example, the process requires 186 MB/sec and the system designers know there will never be a need to support more than one stream at 216 MB/sec.
The SetRtio request has a number of flags and parameters to control its operation. These are all documented in the StorNext File System API Guide that describes the external API in detail. For this example, set the handle for the indicated stripe group using the RT_SET parameter.
In most cases, system designers ensure that the amount of RTIO is not oversubscribed. This means that processes will not ask for more RTIO than is specified in the configuration file. However, it is possible to request more RTIO than is configured. The API uses the RT_MUST flag to indicate that the call must succeed with the specified amount. If the flag is clear, the call allocates as much as it can. In both cases, the amount allocated is returned to the caller.
The SetRtio call accepts two different types of handles. The first is a handle to the root directory. In this mode the stripe group is put into real-time mode, but no specific file handle is tagged as being ungated. Real-time I/O continues on the stripe group until it is explicitly cleared with a SetRtio call on the root directory that specifies the RT_CLEAR flag; the file system is unmounted; or the system is rebooted. It is up to the application to make a subsequent call to EnableRtio (F_ENABLERTIO) on a specific handle.
If the handle in the SetRtio call refers to a regular file, it is the equivalent of a SetRtio call on the root directory followed by an EnableRtio call. The file handle will be ungated until it is closed, cleared (RT_CLEAR in a SetRtio call), or disabled (DisableRtio). When the handle is closed, the amount of real-time I/O is released back to the system. This causes the FSM to readjust the amount of bandwidth available to all clients by issuing a series of callbacks.
The client automatically issues a call to the FSM with the RT_CLEAR flag specifying the amount of real-time I/O set on the file. If multiple handles are open on the file—each with a different amount of real-time I/O—only the last file close triggers the releasing action; all aggregate RTIO are released.
This automatic clearing of real-time I/O is carried out in the context of the process that is closing the file. If the FSM cannot be reached for some reason, the request is enqueued on a daemon and the process closing the file is allowed to continue. In the background, the daemon attempts to inform the FSM that the real-time I/O has been released.
Different processes can share the same file in real-time and non-real-time mode. This is because the level of gating is at the handle level, not the file level. This allows a real-time process to perform ingest of material (video data) at the same time as non-real-time processes are performing other operations on the file.
In the illustration, Process A has ungated access to file foo. Processes B and C also are accessing file foo, but the client gates their I/O accesses. If multiple handles are open to the same file and all are in real-time mode, only the last close of the handle releases the real-time I/O back to the system. This is because on most platforms the file system is informed only on the last close of a file.
It is also possible to denote using the RT_NOGATE flag that a handle should not be gated without specifying any amount of real-time I/O. This is useful for infrequently accessed files (such as index files) that should not be counted against the non-real-time I/O. System designers typically allow for some amount of overage in their I/O subsystem to account for non-gated files.
When the FSM receives a request for RTIO, it takes the amount reserved into consideration. The reserve amount functions as a soft limit beyond which the FSM will not traipse. The calculation for RTIO is as follows:
available_rtio = (rtio_limit) - (rtio_current) - (rtio_reserve)
In the above calculation, rtio_limit is the stripe group’s maxmimum number of IOs (the value of parameter Rtios), rtio_current is the total number of currently reserved real-time IOs, and rtio_reserve is the minimum IOs reserved for non-realtime IOs (the value of RtiosReserve).
All internal calculations are done in terms of I/O/sec.