Intelligently shedding load is an area of active research. One policy is to allow only users that are already actively using the system to continue using the system, and to turn new sessions away; this can be done by only allowing writes by users that have valid cookies when the system is overloaded. Alternatively, users can be binned into different classes in some external fashion, and under overload, SSM can be configured to service only selected classes.
We are exploring the use of rolling reboots as a method of proactively avoiding failures.
Currently, Pinpoint monitors statisics that empirically correlate with injected failures; however, we have no proof that they are the most relevant ones. We intend to apply statistical learning theory to automatically determine which measurable features best correlate with failures.