Callbacks

The cornerstones of the communications between the FSM and the client are callbacks and tokens. A callback is an unsolicited message from the FSM to the client requesting that the client adjust its real-time I/O parameters. The callback contains a token that specifies the amount of non-real-time I/O available on a stripe group.

Initially, all stripe groups in a file system are in non-real-time (ungated) mode. When the FSM receives the initial request for real-time I/O, it first issues callbacks to all clients informing them that the stripe group is now in real-time mode. The token accompanying the message specifies no I/O is available for non-real-time I/O. Clients must now obtain a non-real-time token before they can do any non-real-time I/O.

After sending out all callbacks, the FSM sets a timer based on the RtTokenTimeout value, which by default is set to 1.5 seconds. If all clients respond to the callbacks within the timeout value the RTIO request succeeds, and a response is set to the requesting client.

Figure 1: Callback Flow for Initial RTIO Request

In the above diagram, a process on client A requests some amount of RTIO in Step 1. Since this is the first request, the FSM issues callbacks to all connected clients (Steps 2-5) informing them that the stripe group is now in real-time mode. The clients respond to the FSM in Steps 6-9. After all the clients have responded, the FSM responds to the original requesting client in Step 10.

If the timer expires and one or more clients have not responded, the FSM must retract the callbacks. It issues a response to the requesting client with the IP number of the first client that did not respond to the callback. This allows the requesting client to log the error with the IP number so system administrators have a chance of diagnosing the failure. It then sends out callbacks to all the clients to which it first sent the callbacks, retracting them to the original state. In our example, it would set the stripe group back to non-real-time mode.

After sending out the callbacks, the FSM waits for a response using the RtTokenTimeout value as before. If a client again does not respond within the timeout value, the callbacks are retracted and sent out again. This repeats until all clients respond. During this time of token retractions, real-time requests cannot be honored and will only be enqueued.