The Power of Choice in Data-Aware Cluster Scheduling
Shivaram Venkataraman and Aurojit Panda, University of California, Berkeley; Ganesh Ananthanarayanan, Microsoft Research; Michael J. Franklin and Ion Stoica, University of California, Berkeley
Providing timely results in the face of rapid growth in data volumes has become important for analytical frameworks. For this reason, frameworks increasingly operate on only a subset of the input data. A key property of such sampling is that combinatorially many subsets of the input are present. We present KMN, a system that leverages these choices to perform data-aware scheduling, i.e., minimize time taken by tasks to read their inputs, for a DAG of tasks. KMN not only uses choices to co-locate tasks with their data but also percolates such combinatorial choices to downstream tasks in the DAG by launching a few additional tasks at every upstream stage. Evaluations using workloads from Facebook and Conviva on a 100-machine EC2 cluster show that KMN reduces average job duration by 81% using just 5% additional resources.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Shivaram Venkataraman and Aurojit Panda and Ganesh Ananthanarayanan and Michael J. Franklin and Ion Stoica},
title = {The Power of Choice in {Data-Aware} Cluster Scheduling},
booktitle = {11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)},
year = {2014},
isbn = { 978-1-931971-16-4},
address = {Broomfield, CO},
pages = {301--316},
url = {https://www.usenix.org/conference/osdi14/technical-sessions/presentation/venkataraman},
publisher = {USENIX Association},
month = oct
}
connect with us