Check out the new USENIX Web site. next up previous
Next: Self-Protecting Up: Experimental Results Previous: Recovery-Friendly

Self-Tuning

The use of AIMD allows the stubs to adaptively discover the capacity of the system, without requiring an administrator to configure a system and a workload, and then run experiments to determine whether the system services the workload in an acceptable fashion. In the manual process, if the workload increases drastically, the configuration may need to be changed.

In SSM, the allowable amount of time for session state retrieval and storage is specified in a configurable timeout value. The system tunes itself using the AIMD mechanism, maximizing the number of requests that can be serviced within that time bound. SSM automatically adapts to higher load gracefully. Under overload, requests are rejected instead of allowing latency to increase beyond a reasonable threshold. If SSM is is deployed in an environment with a pool of free machines, Pinpoint can monitor the number of requests that are rejected, and start up new bricks to accomodate the increase in workload.

SSM discovers the maximum throughput of the system correctly. Recall that in SSM, read and write requests are expected to complete within a timeout. We define goodput as the number of requests that complete within the specified timeout. Offered load is goodput plus all requests that fail. Requests that complete after that timeout are not counted toward goodput. In this benchmark, W is set to 3, WQ is set to 2, timeout is set to 60 ms, R is set to 1, and the size of state written is 8K. We use 3 bricks.

First, we discover the maximum goodput of the basic system with no admission control or AIMD. We do so by varying the number of sending threads to see where goodput plateaus. We run separate experiments; first we generate a load with 120 threads corresponding to roughly 1900 requests per second, and then with 150 threads, corresponding to roughly 2100 requests per second. Figure 5 shows that goodput plateaus around 1900-2000 requests per second.

Figure: SSM running with 3 Bricks, no AIMD or admission control. The graph on the top shows a load of 120 threads sending read and write requests of session state. The graph on the bottom shows a load of 150 threads. sending threads. System throughput peaks at around 1900-2000 requests per second.

We continue increasing the load until goodput drops to zero. Goodput eventually drops to zero because the rate of incoming requests is higher than the rate at which the bricks can process, and eventually, the brick spends all of its time fulfilling timed-out requests instead of doing useful work. As can be seen in the lightened portion of Figure 6, the bricks collapse under the load of 220 threads, or about 3400 requests a second; requests arrive at a rate faster than can be serviced, and hence the system goodput falls to zero at time 11.

Figure: SSM running with 3 Bricks, no AIMD or admission control. The bricks collapse at time 11 under the load of 220 threads generating requests.

After manually verifying the maximum goodput of the system, we turn on the self-protecting features, namely by allowing the stub to use the AIMD sending window size and by forcing bricks to service only requests that have not timed out, and run the experiment again.

We generate an even higher load than what caused goodput to fall to zero in the basic system, using 240 threads, corresponding to roughly 4000 requests per second. As seen in figure 7, SSM discovers the maximum goodput and the system continues to operate at that level. Note that this means that the system is rejecting the excess requests, since the bricks are already at capacity, and the excess load is simply being rejected; the percentage of rejected requests is discussed in the next section. We sketch a simple and reasonable shedding policy in future work.

Figure: SSM running with 3 Bricks, with AIMD and admission control. SSM discovers maximum goodput of around 2000.


next up previous
Next: Self-Protecting Up: Experimental Results Previous: Recovery-Friendly
Benjamin Chan-Bin Ling 2004-03-04