sponsors
help promote
Get more
Help Promote graphics!
usenix conference policies
You are here
RAIDShield: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures
Ao Ma, Fred Douglis, Guanlin Lu, and Darren Sawyer, EMC Corporation; Surendar Chandra and Windsor Hsu, Datrium, Inc.
Modern storage systems orchestrate a group of disks to achieve their performance and reliability goals. Even though such systems are designed to withstand the failure of individual disks, failure of multiple disks poses a unique set of challenges. We empirically investigate disk failure data from a large number of production systems, specifically focusing on the impact of disk failures on RAID storage systems. Our data covers about one million SATA disks from 6 disk models for periods up to 5 years. We show how observed disk failures weaken the protection provided by RAID. The count of reallocated sectors correlates strongly with impending failures.
With these findings we designed RAIDSHIELD, which consists of two components. First, we have built and evaluated an active defense mechanism that monitors the health of each disk and replaces those that are predicted to fail imminently. This proactive protection has been incorporated into our product and is observed to eliminate 88% of triple disk errors, which are 80% of all RAID failures. Second, we have designed and simulated a method of using the joint failure probability to quantify and predict how likely a RAID group is to face multiple simultaneous disk failures, which can identify disks that collectively represent a risk of failure even when no individual disk is flagged in isolation. We find in simulation that RAID-level analysis can effectively identify most vulnerable RAID-6 systems, improving the coverage to 98% of triple errors.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Ao Ma and Fred Douglis and Guanlin Lu and Darren Sawyer and Surendar Chandra and Windsor Hsu},
title = {{RAIDShield}: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures},
booktitle = {13th USENIX Conference on File and Storage Technologies (FAST 15)},
year = {2015},
isbn = {978-1-931971-201},
address = {Santa Clara, CA},
pages = {241--256},
url = {https://www.usenix.org/conference/fast15/technical-sessions/presentation/ma},
publisher = {USENIX Association},
month = feb
}
connect with us