- Overview
- Workshop Organizers
- Registration Information
- Registration Information
- Registration Discounts
- At a Glance
- Calendar
- Activities
- Workshop Program
- Live Streaming
- Birds-of-a-Feather Sessions
- Sponsorship
- Hotel and Travel Information
- Services
- Students
- Questions
- Help Promote!
- For Participants
- Call for Papers
- Past Workshops
sponsors
usenix conference policies
You are here
A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster
K. V. Rashmi and Nihar B. Shah, University of California, Berkeley; Dikang Gu, Hairong Kuang, and Dhruba Borthakur, Facebook; Kannan Ramchandran, University of California, Berkeley
Erasure codes, such as Reed-Solomon (RS) codes, are being increasingly employed in data centers to combat the cost of reliably storing large amounts of data. Although these codes provide optimal storage efficiency, they require significantly high network and disk usage during recovery of missing data.
In this paper, we first present a study on the impact of recovery operations of erasure-coded data on the data-center network, based on measurements from Facebook’s warehouse cluster in production. To the best of our knowledge, this is the first study of its kind available in the literature. Our study reveals that recovery of RS-coded data results in a significant increase in network traffic, more than a hundred terabytes per day, in a cluster storing multiple petabytes of RS-coded data.
To address this issue, we present a new storage code using our recently proposed Piggybacking framework, that reduces the network and disk usage during recovery by 30% in theory, while also being storage optimal and supporting arbitrary design parameters. The implementation of the proposed code in the Hadoop Distributed File System (HDFS) is underway. We use the measurements from the warehouse cluster to show that the proposed code would lead to a reduction of close to fifty terabytes of cross-rack traffic per day.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {K. V. Rashmi and Nihar B. Shah and Dikang Gu and Hairong Kuang and Dhruba Borthakur and Kannan Ramchandran},
title = {A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster},
booktitle = {5th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 13)},
year = {2013},
address = {San Jose, CA},
url = {https://www.usenix.org/conference/hotstorage13/workshop-program/presentation/rashmi},
publisher = {USENIX Association},
month = jun
}
connect with us