NSDI '06 Abstract
Pp. 4558 of the Proceedings
Efficient Replica Maintenance for Distributed Storage Systems
Byung-Gon Chun, University of California, Berkeley; Frank Dabek, MIT Computer Science and Artificial Intelligence Laboratory; Andreas Haeberlen, Rice University/MPI-SWS; Emil Sit, MIT Computer Science and Artificial Intelligence Laboratory; Hakim Weatherspoon, University of California, Berkeley; M. Frans Kaashoek, MIT Computer Science and Artificial Intelligence Laboratory; John Kubiatowicz, University of California, Berkeley; Robert Morris, MIT Computer Science and Artificial Intelligence Laboratory
Abstract
This paper considers replication
strategies for storage systems that aggregate the disks of many
nodes spread over the Internet. Maintaining replication in such
systems can be prohibitively expensive, since every transient
network or host failure could potentially lead to copying a
server's worth of data over the Internet to maintain replication
levels.
The following insights in designing an efficient replication
algorithm emerge from the paper's analysis. First, durability can
be provided separately from availability; the former is less
expensive to ensure and a more useful goal for many wide-area
applications. Second, the focus of a durability algorithm must be
to create new copies of data objects faster than permanent disk
failures destroy the objects; careful choice of policies for what
nodes should hold what data can decrease repair time. Third,
increasing the number of replicas of each data object does not help
a system tolerate a higher disk failure probability, but does help
tolerate bursts of failures. Finally, ensuring that the system
makes use of replicas that recover after temporary failure is
critical to efficiency.
Based on these insights, the paper proposes the
Carbonite replication algorithm for keeping data durable at a low
cost. A simulation of Carbonite storing 1 TB of data over a 365 day
trace of PlanetLab activity shows that Carbonite is able to keep all
data durable and uses 44% more network traffic than a hypothetical
system that only responds to permanent failures. In comparison,
Total Recall and DHash require almost a factor of two more network
traffic than this hypothetical system.
- View the full text of this paper in HTML and PDF. Listen to the presentation in MP3 format.
Until May 2007, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2006 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
|