usenix conference policies
You are here
The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment
Mary Baker and Mark Sullivan, University of California, Berkeley
As organizations with high system availability requirements move to UNIX, the elimination of down-time in the UNIX environment becomes a more important issue. Designing for fast recovery, rather than crash prevention, can provide low-cost highly available systems without sacrificing performance or simplicity. In Sprite, a UNIX-like distributed operating system, we accomplish this fast recovery in part through the use of a recovery box: a stable area of memory in which the system stores carefully selected pieces of system state, and from which the system can be regenerated quickly. Error detection using checksums allows the system to revert to its traditional reboot sequence if the recovery box data is corrupted during system failure. Recent statistics about the types and frequencies of operating system failures indicate that fast recovery using the recovery box will be possible most of the time. Using our recovery box implementation, a Sprite file server recovers in 26 seconds and a database manager with ten remote client processes recovers in six seconds — fast enough that many users and applications will not care that the system.
author = {Mary Baker and Mark Sullivan},
title = {The Recovery Box: Using Fast Recovery to Provide High Availability in the {UNIX} Environment},
booktitle = {USENIX Summer 1992 Technical Conference (USENIX Summer 1992 Technical Conference)},
year = {1992},
address = {San Antonio, TX},
url = {https://www.usenix.org/conference/usenix-summer-1992-technical-conference/recovery-box-using-fast-recovery-provide-high},
publisher = {USENIX Association},
month = jun
}
connect with us