Robustness in the Salus Scalable Block Store

Yang Wang; Manos Kapritsos; Zuocheng Ren; Prince Mahajan; Jeevitha Kirubanandam; Lorenzo Alvisi; Mike Dahlin

Authors:

Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin, The University of Texas at Austin

Abstract:

This paper describes Salus, a block store that seeks to maximize simultaneously both scalability and robustness. Salus provides strong end-to-end correctnessguarantees for read operations, strict ordering guarantees for write operations, and strong durability and availability guarantees despite a wide range of server failures (including memory corruptions, disk corruptions, ﬁrmware bugs, etc.). Such increased protection does not come at the cost of scalability or performance: indeed, Salus often actually outperforms HBase (the codebase from which Salus descends). For example, Salus’ active replication allows it to halve network bandwidth while increasing aggregate write throughput by a factor of 1.74 compared to HBase in a well-provisioned system.

Yang Wang, The University of Texas at Austin

Manos Kapritsos, The University of Texas at Austin

Zuocheng Ren, The University of Texas at Austin

Prince Mahajan, The University of Texas at Austin

Jeevitha Kirubanandam, The University of Texas at Austin

Lorenzo Alvisi, The University of Texas at Austin

Mike Dahlin, The University of Texas at Austin

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {180322,
author = {Yang Wang and Manos Kapritsos and Zuocheng Ren and Prince Mahajan and Jeevitha Kirubanandam and Lorenzo Alvisi and Mike Dahlin},
title = {Robustness in the Salus Scalable Block Store},
booktitle = {10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13)},
year = {2013},
isbn = {978-1-931971-00-3},
address = {Lombard, IL},
pages = {357--370},
url = {https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/wang_yang},
publisher = {USENIX Association},
month = apr
}

Download

Wang PDF

View the slides

Presentation Video

Presentation Audio

Download Audio

Public Summary:

by Arvind Krishnamurthy

Today's Internet services are characterized by a few clear trends. An increasingly large fraction of the services are hosted in the cloud, there is progressively greater use of cheap and off-the-shelf hardware inside cloud data centers, and users are generating increasingly larger volumes of data. Given these trends, a cloud storage system has to meet a number of demanding requirements: the platform needs to be scalable, it has to provide strong consistency semantics, and it has to be robust to a broad class of failures. Numerous systems exist that partially satisfy some of the aforementioned requirements. For example, BFT-based systems provide strong robustness but at the cost of scalability, DHTs are highly scalable but fail to provide consistency and robustness, and systems such as BigTable and HBase provide moderate levels of scalability but not reliability in the face of data corruption.

It is in this context that the authors develop a system that provides a fresh perspective on building scalable and robust storage systems. The authors design and implement Salus, the first block store that is scalable and robust to commission failures. It is similar to Amazon's Elastic Block Store (EBS), providing a virtual block-level device that can be read/write mounted by a single client at a time. Salus's implementation descends from HBase but differs in two key respects. HBase can support many clients, but is robust only to omission failures (e.g., crashes or dropped messages) rather than the commission failures (e.g., data corruption) that Salus can tolerate. Crucially, Salus comes up with the surprising result that robustness and scalability are not necessarily trade-offs and that a carefully designed system can tolerate many types of failures and provide end-to-end correctness guarantees without incurring high replication costs.

Salus is able to achieve this result using a combination of techniques. Its scalability arises from the traditional technique of striping a volume across many storage servers, but, unlike most such striping, supports totally-ordered operations that can be executedconcurrently across servers without a single serialization point. Salus replicates both the storage and the computation layer to safeguard against data corruption. It provides end-to-end reliability by maintaining a Merkle tree for each volume that a client can consult in order to validate data retrieved from the storage system. While elements of the approach have been proposed before, the key contribution is the unique combination of ideas that work quite well together.

connect with us