Performance Inconsistency in Large Scale Data Processing Clusters

Authors: 

Mingyuan Xia and Nan Zhu, McGill University; Yuxiong He and Sameh Elnikety, Microsoft Research Redmond; Xue Liu, McGill University

Abstract: 

A large shared computing platform is usually divided into several virtual clusters of fixed sizes, and each virtual cluster is used by a team. A cluster scheduler dynamically allocates physical servers to the virtual clusters depending on their sizes and current job demands. In this paper, we show that current cluster schedulers, which optimize for instantaneous fairness, cause performance inconsistency among the virtual clusters: Virtual clusters with similar loads see very different performance characteristics.

We identify this problem by studying a production trace obtained from a large cluster and performing a simulation study. Our results demonstrate that when using an instantaneous-fairness scheduler, a large VC that contributes more resources during underload periods can not be properly rewarded during its overload periods. These results suggest that not using resource sharing history is the root cause for the performance inconsistency.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {180176,
author = {Mingyuan Xia and Nan Zhu and Sameh Elnikety and Xue Liu and Yuxiong He},
title = {Performance Inconsistency in Large Scale Data Processing Clusters},
booktitle = {10th International Conference on Autonomic Computing (ICAC 13)},
year = {2013},
isbn = {978-1-931971-02-7},
address = {San Jose, CA},
pages = {297--302},
url = {https://www.usenix.org/conference/icac13/technical-sessions/presentation/xia},
publisher = {USENIX Association},
month = jun
}

Presentation Audio