Data Caching for Enterprise-Grade Petabyte-Scale OLAP

Authors: 

Chunxu Tang and Bin Fan, Alluxio; Jing Zhao and Chen Liang, Uber, Inc; Yi Wang and Beinan Wang, Alluxio; Ziyue Qiu, Carnegie Mellon University and Uber, Inc.; Lu Qiu, Bowen Ding, Shouzhuo Sun, Saiguang Che, Jiaming Mai, Shouwei Chen, Yu Zhu, and Jianjian Xie, Alluxio; Yutian (James) Sun, Meta, Inc.; Yao Li and Yangjun Zhang, Uber, Inc.; Ke Wang, Meta, Inc.; Mingmin Chen, Uber, Inc.

Abstract: 

With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these challenges, this paper introduces the Alluxio local (edge) cache, a highly effective architectural optimization tailored for such environments. This embeddable cache, optimized for petabyte-scale data analytics, leverages local SSD resources to alleviate network I/O and API call pressures, significantly improving data transfer efficiency. Integrated with OLAP systems like Presto and storage services like HDFS, the Alluxio local cache has demonstrated its effectiveness in handling large-scale, enterprise-grade workloads over three years of deployment at Uber and Meta. We share insights and operational experiences in implementing these optimizations, providing valuable perspectives on managing modern, massive-scale OLAP workloads.

USENIX ATC '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {298597,
author = {Chunxu Tang and Bin Fan and Jing Zhao and Chen Liang and Yi Wang and Beinan Wang and Ziyue Qiu and Lu Qiu and Bowen Ding and Shouzhuo Sun and Saiguang Che and Jiaming Mai and Shouwei Chen and Yu Zhu and Jianjian Xie and Yutian (James) Sun and Yao Li and Yangjun Zhang and Ke Wang and Mingmin Chen},
title = {Data Caching for {Enterprise-Grade} {Petabyte-Scale} {OLAP}},
booktitle = {2024 USENIX Annual Technical Conference (USENIX ATC 24)},
year = {2024},
isbn = {978-1-939133-41-0},
address = {Santa Clara, CA},
pages = {901--915},
url = {https://www.usenix.org/conference/atc24/presentation/tang},
publisher = {USENIX Association},
month = jul
}