Configuration
The primary configuration for QOS is in the FSM configuration file. No client configuration is required, although there is a QOS tuning parameter that can be specified when the file system is mounted.
Real-time I/O is based on well-formed I/O. This means that for the purposes of determining bandwidth rates, well-formed I/O is characterized as being a stripe width in size. This makes the best utilization of the disks in the stripe group and maximizes the transfer rate. Internally, non-real-time I/O is tracked by number of I/O operations per second. An I/O operation is a minimum of a file system block size, and a maximum of the file system block size multiplied by the stripe breadth.
Typically, it is easier to qualify an I/O subsystem in terms of MB/sec that can be sustained. However, internally the file system tracks everything on an I/O/sec basis. Note that the file system tracks only non-real-time I/O (that is, it gates only non-real-time I/O). An I/O is a minimum of the file system block size, and is typically the point at which the file system hands the request off to the disk driver (IoCallDriver in Windows, or a strategy call in UNIX).
The file system counts the number of I/Os that have taken place during a given second. If the number exceeds that which is allotted, the request is pended until I/O becomes available (typically in the next second). I/O is honored in FIFO fashion; no priority is assigned.
To convert between I/Os and MB/sec, SNFS uses a somewhat unique formula that quantifies I/O as well-formed. The rationale behind this is due to the way in which many video applications make real-time I/O requests. To optimize the disk subsystem, real-time I/Os are well-formed so they saturate the disks. In SNFS terminology, this would be an I/O that covers all of the disks in a stripe.
This can be expressed as follows:
For example, with a file system blocksize of 4k, a StripeBreadth of 384, and a StripeDepth of four, the equivalent number of I/Os/sec for each well-formed I/O would be 216 Mb/sec / (384 * 4 * 4k). This is equivalent to 221184 k/sec / 6144k= 36 I/O/sec.
All storage subsystems are different, so users must qualify the I/O subsystem and determine the maximum amount of I/O bandwidth available. SNFS relies on the correct setting in the configuration file; if the storage system changes (for example, because of a new disk array,) the user must re-qualify the I/O subsystem to determine the amount of bandwidth available. This amount will be specified in the FSM configuration file. The user can also specify the minimum amount of bandwidth to be provided to non-real-time applications.
There are five keywords controlling QOS that can be specified in the stripe group section of the FSM configuration file. Not all keywords need be present. Typically, the user specifies the RTIO bandwidth in terms of either number of I/O operations per second (rtios) or megabytes per second (rtmb). Keywords are not case sensitive.
For a minimum configuration, only the real-time limit (either rtios or rtmb) need be specified. All other configuration variables default to reasonable values.
Name |
Description |
Default |
Rtios |
The maximum number of real-time I/Os allowed in a stripe group during any one-second period. |
0 (no real-time) |
Rtmb |
Maximum amount of real-time MB/sec allowed on the stripe group during any one-second period. |
0 (no real-time) |
RtiosReserve |
Amount of reserve in I/Os/sec from the maximum allowed for non-real-time I/Os. Must be greater than the equivalent to 1MB/sec or the amount that can be transferred to a single stripe line. |
Equivalent to 1MB/sec |
RtmbReserve |
Amount to reserve in MB/sec from the maximum allowed for non-real-time I/O. |
Must be greater than 1. 1MB/sec |
RtTokenTimeout |
Time in seconds to wait for clients to respond to a token callback. |
1.5 seconds |
The limit will be specified in terms of I/Os per second (parameter Rtios) or in terms of MB/sec (parameter Rtmb). Case is not sensitive. Note that I/Os per second are I/Os of any size to the disk subsystem. Either or both may be specified. If both are specified, the lower limit is used to
throttle I/O. If neither is specified, no real-time I/O is available on the stripe group. These parameters are applied to a stripe group definition.
Example (Linux)
<stripeGroup> index="1" name="MyStripeGroup" realTimeIOs="2048" realTimeMB="10"
</stripeGroup>
Example (Windows)
[StripeGroup MyStripeGroup]
Rtios 2048
Rtmb 10
The above example specifies that the storage system can support a maximum of 2048 I/Os per second at any instant, aggregate among all the clients, or 10 MB/sec, whichever is lower.
Most real-time I/O requests will be a stripe line at a time to maximize performance. Non-real-time I/Os will be a minimum of a file system block size.
Note: It is important to realize that the rtios and rtmb settings refer to the total amount of sustained bandwidth available on the disk subsystem. Any I/O, either real-time or non-real-time, will ultimately be deducted from this overall limit. The calculations of available real-time and non-real-time are discussed later.
Specifying rtmb in the FSM configuration file is only recommended if all I/Os are well formed (that is, a full stripe width). Otherwise, the conversion between MB/sec and I/Os/sec using the well-formed I/O calculation could lead to unexpected results.
To change the RTIO parameters on the MDC, modify the /usr/cvfs/config/<fsname.cfgx>
file, and restart StorNext services. Refer to the examples above for the Linux and Windows RTIO parameter names. You can also use the StorNext GUI.
To prevent deadlock, the QOS implementation never allows zero I/O/sec for non-real-time I/O. Otherwise, a system could block with many critical file system resources held waiting for I/O to become available. This is especially true via flush-on-close I/O via the buffer cache. It becomes extremely difficult to diagnose system hangs because no I/O is available. For this reason, QOS always reserves some amount of I/O for non-real-time I/O.
The minimum amount of non-real-time I/O reserved for non-real-time applications is one MB/sec. This can be changed via the stripe group section parameters (again, case is not sensitive). If both are specified, the lower of the two amounts is chosen. This amount is shared by all non-real-time applications on each client.
Example (Linux)
<stripeGroup> index="1" name="MyStripeGroup" realTimeIOsReserve="256" realTimeMBReserve="2"
</stripeGroup>
Example (Windows)
[StripeGroup MyStripeGroup]
RtiosReserve 256
RtmbReserve 2
The RtTokenTimeout parameter controls the amount of time the FSM waits for clients to respond to callbacks. In most normal SANs, the default two-second setting is sufficient. This value may need to be changed for a SAN that has a mixture of client machine types (Linux, Windows NT, etc.) that all have different TCP/IP characteristics. Also, large numbers of clients (greater than 32) may also require increasing the parameter.
For example, if the FSM should ever fail, the clients will attempt to reconnect. When the FSM comes back online, the amount of time the clients take to re-establish their TCP/IP connection to the FSM can differ wildly. To avoid unnecessary timeouts, the RtTokenTimeout parameter can be increased, meaning the FSM waits longer for callback responses.
If a client times out on a token retraction, the original requestor receives an error from the FSM that includes the IP address of the offending client. This error is logged to syslog, and alternatively to the desktop on Windows clients. This can help in diagnosing reconnect failures, and in determining if the token time value should be increased.
When a client obtains a non-real-time I/O token from the FSM, the token allows the client a specific amount of non-real-time I/O. If the client is inactive for a period of time, the token is relinquished and the non-real-time I/O released back to the FSM for distribution to other clients. The timeout period is controlled by the nrtiotokenhold mount option on UNIX platforms, and the QOS Token Hold Time parameter in the mount options tab of the SNFS control panel on Windows platforms. The default is sixty (60) seconds.
This means that after sixty seconds without non-real-time I/O on a stripe group, the non-real-time token for that stripe group is released. The parameter should be specified in five (5) second increments. If it is not, it will be silently rounded up to the next five-second boundary. If the syslog level is set to debug, the file system dumps out its mount parameters so the value can be seen.