- Overview
- Conference Organizers
- Registration Information
- Registration Discounts
- At a Glance
- Calendar
- Activities
- Technical Sessions
- Workshops
- Posters and Demos
- Birds-of-a-Feather Sessions
- Sponsorship
- Hotel and Travel Information
- Services
- Students
- Questions
- Help Promote!
- For Participants
- Call for Papers
- Past Conferences
sponsors
usenix conference policies
Fault Management in Map-Reduce Through Early Detection of Anomalous Nodes
Selvi Kadirvel, Jeffrey Ho, and José A. B. Fortes, University of Florida
Map-Reduce frameworks such as Hadoop have built-in fault-tolerance mechanisms that allow jobs to run to completion even in the presence of certain faults. However, these jobs can experience severe performance penalties under faulty conditions. In this paper, we present Fault-Managed Map-Reduce (FMR) which augments Hadoop with the functionality to mitigate job execution time penalties. FMR uses an anomaly detection algorithm based on sparse coding to anticipate a faulty slave node. This proposed technique has the following key advantages: (1) model training uses only normal-class data, (2) time taken for prediction is less than a second, and (3) confidence estimates are produced along with the anomaly prediction. FMR uses the result of anomaly detection to invoke a closed-loop recovery action, namely dynamic resource scaling. A scaling heuristic is proposed to determine the extent of scaling necessary to reduce impending performance penalty. FMR facilitates practical adoption by being implemented as a set of libraries and scripts that require no changes to the underlying source code of Hadoop. A set of realistic Map-Reduce applications were studied through a few thousand job executions on a 72-node Hadoop testbed. Detailed empirical evaluation shows that FMR successfully mitigates performance penalties from 119% down to 14%, averaged across experiments.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Selvi Kadirvel and Jeffrey Ho and Jos{\'e} A. B. Fortes},
title = {Fault Management in {Map-Reduce} Through Early Detection of Anomalous Nodes},
booktitle = {10th International Conference on Autonomic Computing (ICAC 13)},
year = {2013},
isbn = {978-1-931971-02-7},
address = {San Jose, CA},
pages = {235--245},
url = {https://www.usenix.org/conference/icac13/technical-sessions/presentation/kadirvel},
publisher = {USENIX Association},
month = jun
}
connect with us