Sunil Kittur and Douglas Steel
ICL High Performance Systems, Manchester, UK
Francois Armand and Jim Lipkis
Chorus Systems, Saint-Quentin-En-Yvelines, France
This paper describes mechanisms that allow the failover of resources from failed nodes. Failover is currently restricted to disk volumes and file systems. The failover mechanisms maintain the correct semantics at the UNIX system call level for operations from surviving nodes that were in progress at the time of the failure, including non-idempotent operations.
Minimal resource and performance overheads are imposed for the normal running case, and in contrast to replication techniques, state is recovered and rebuilt at the time of a failover.
Download the full text of this paper in ASCII (34,555 bytes) and POSTSCRIPT (148,603 bytes) form.
To Become a USENIX Member, please see our Membership Information.