Lube: Mitigating Bottlenecks in Wide Area Data Analytics

Authors: 

Hao Wang and Baochun Li, University of Toronto

Abstract: 

Over the past decade, we have witnessed exponential growth in the density (petabyte-level) and breadth (across geo-distributed datacenters) of data distribution. It becomes increasingly challenging but imperative to minimize the response times of data analytic queries over multiple geo-distributed datacenters. However, existing scheduling-based solutions have largely been motivated by pre-established mantras (e.g., bandwidth scarcity). Without data-driven insights into performance bottlenecks at runtime, schedulers might blindly assign tasks to workers that are suffering from unidentified bottlenecks.

In this paper, we present Lube, a system framework that minimizes query response times by detecting and mitigating bottlenecks at runtime. Lube monitors geo-distributed data analytic queries in real-time, detects potential bottlenecks, and mitigates them with a bottleneck-aware scheduling policy. Our preliminary experiments on a real-world prototype across Amazon EC2 regions have shown that Lube can detect bottlenecks with over 90% accuracy, and reduce the median query response time by up to 33% compared to Spark’s built-in locality-based scheduler.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {203330,
author = {Hao Wang and Baochun Li},
title = {Lube: Mitigating Bottlenecks in Wide Area Data Analytics},
booktitle = {9th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 17)},
year = {2017},
address = {Santa Clara, CA},
url = {https://www.usenix.org/conference/hotcloud17/program/presentation/wang},
publisher = {USENIX Association},
month = jul
}