DQBarge: Improving Data-Quality Tradeoffs in Large-Scale Internet Services

Authors: 

Michael Chow, University of Michigan; Kaushik Veeraraghavan, Facebook, Inc.; Michael Cafarella and Jason Flinn, University of Michigan

Abstract: 

Modern Internet services often involve hundreds of distinct software components cooperating to handle a single user request. Each component must balance the competing goals of minimizing service response time and maximizing the quality of the service provided. This leads to low-level components making data-quality tradeoffs, which we define to be explicit decisions to return lowerfidelity data in order to improve response time or minimize resource usage.

We first perform a comprehensive study of low-level data-quality tradeoffs at Facebook. We find that such tradeoffs are widespread. We also find that existing data-quality tradeoffs are often suboptimal because the low-level components making the tradeoffs lack global knowledge that could enable better decisions. Finally, we find that most tradeoffs are reactive, rather than proactive, and so waste resources and fail to mitigate system overload.

Next, we develop DQBarge, a system that enables better data-quality tradeoffs by propagating critical information along the causal path of request processing. This information includes data provenance, load metrics, and critical path predictions. DQBarge generates performance and quality models that help low-level components make better, more proactive, tradeoffs. Our evaluation shows that DQBarge helps Internet services mitigate load spikes, improve utilization of spare resources, and implement dynamic capacity planning.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {199323,
author = {MIchael Chow and Kaushik Veeraraghavan and Michael Cafarella and Jason Flinn},
title = {{DQBarge}: Improving {Data-Quality} Tradeoffs in {Large-Scale} Internet Services},
booktitle = {12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)},
year = {2016},
isbn = {978-1-931971-33-1},
address = {Savannah, GA},
pages = {771--786},
url = {https://www.usenix.org/conference/osdi16/technical-sessions/presentation/chow},
publisher = {USENIX Association},
month = nov
}

Presentation Audio