Starburst: A Cost-aware Scheduler for Hybrid Cloud

Authors: 

Michael Luo, Siyuan Zhuang, Suryaprakash Vengadesan, and Romil Bhardwaj, UC Berkeley; Justin Chang, UC Santa Barbara; Eric Friedman, Scott Shenker, and Ion Stoica, UC Berkeley

Distinguished Artifact Award!

Abstract: 

To efficiently tackle bursts in job demand, organizations employ hybrid cloud architectures to scale their batch workloads from their private clusters to public cloud. This requires transforming cluster schedulers into cloud-enabled versions to navigate the tradeoff between cloud costs and scheduler objectives such as job completion time (JCT). However, our analysis over production-level traces show that existing cloud-enabled schedulers incur inefficient cost-JCT trade-offs due to low cluster utilization.

We present Starburst, a system that maximizes cluster utilization to streamline the cost-JCT tradeoff. Starburst's scheduler dynamically controls jobs' waiting times to improve utilization—it assigns longer waits for large jobs to increase their chances of running on the cluster, and shorter waits to small jobs to increase their chances of running on the cloud. To offer configurability, Starburst provides system administrators a simple waiting budget framework to tune their position on the cost-JCT curve. A departure from traditional cluster schedulers, Starburst operates as a higher-level resource manager over a private cluster and dynamic cloud clusters. Simulations over production-level traces and real-world experiments on a 32-GPU private cluster show that Starburst can reduce cloud costs by up to 54-91% over existing cluster managers, while increasing average JCT by at most 5.8%.

USENIX ATC '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {298492,
author = {Michael Luo and Siyuan Zhuang and Suryaprakash Vengadesan and Romil Bhardwaj and Justin Chang and Eric Friedman and Scott Shenker and Ion Stoica},
title = {Starburst: A Cost-aware Scheduler for Hybrid Cloud},
booktitle = {2024 USENIX Annual Technical Conference (USENIX ATC 24)},
year = {2024},
isbn = {978-1-939133-41-0},
address = {Santa Clara, CA},
pages = {37--57},
url = {https://www.usenix.org/conference/atc24/presentation/luo},
publisher = {USENIX Association},
month = jul
}