Load is not what you should balance: Introducing Prequal

Authors: 

Bartek Wydrowski, Google Research; Robert Kleinberg, Google Research and Cornell; Stephen M. Rumble, Google (YouTube); Aaron Archer, Google Research

Abstract: 

We present PReQuaL (Probing to Reduce Queuing and Latency), a load balancer for distributed multi-tenant systems. PReQuaL is designed to minimize real-time request latency in the presence of heterogeneous server capacities and non-uniform, time-varying antagonist load. To achieve this, PReQuaL actively probes server load and leverages the power of d choices paradigm, extending it with asynchronous and reusable probes. Cutting against received wisdom, PReQuaL does not balance CPU load, but instead selects servers according to estimated latency and active requests-in-flight (RIF). We explore the major design features of PReQuaL on a testbed system and describe our experience using it to balance load within YouTube, where it has been running for more than a year. PReQuaL has dramatically decreased tail latency, error rates, and resource use, enabling YouTube and other production systems at Google to run at much higher utilization.

NSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {295611,
author = {Bartek Wydrowski and Robert Kleinberg and Stephen M. Rumble and Aaron Archer},
title = {Load is not what you should balance: Introducing Prequal},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {1285--1299},
url = {https://www.usenix.org/conference/nsdi24/presentation/wydrowski},
publisher = {USENIX Association},
month = apr
}