Next: Conclusions Up: Session State: Beyond Soft Previous: Related Work

Future Work

SSM currently does not tolerate catastrophic site failures, but can be extended to do so. When selecting bricks for writes, SSM can be extended to select $W_{local}$ bricks from the local network, and $W_{remote}$ bricks from a remote site. SSM can return from writes when $WQ_{local}$ bricks have replied, and 1 remote brick has replied.

Intelligently shedding load is an area of active research. One policy is to allow only users that are already actively using the system to continue using the system, and to turn new sessions away; this can be done by only allowing writes by users that have valid cookies when the system is overloaded. Alternatively, users can be binned into different classes in some external fashion, and under overload, SSM can be configured to service only selected classes.

We are exploring the use of rolling reboots as a method of proactively avoiding failures.

Currently, Pinpoint monitors statisics that empirically correlate with injected failures; however, we have no proof that they are the most relevant ones. We intend to apply statistical learning theory to automatically determine which measurable features best correlate with failures.

Next: Conclusions Up: Session State: Beyond Soft Previous: Related Work

Benjamin Chan-Bin Ling 2004-03-04