Jia Guo, LinkedIn, Yifan (Sabrina) Zhao, Netflix, and Dino Occhialini, LinkedIn
Join us for this session to learn more about how cost-to-serve was optimized by nearly 50% for Apache Pinot OLAP Database's production fleet of ~14K machines at LinkedIn.
The nature of OLAP workloads running in LinkedIn on Pinot have diverse characteristics in terms of:
- Varying workload demand (SLOs as low as P99 query latency < 100ms at 100K read QPS).
- Varying cost / resource usage (CPU, memory, IO) of SQL queries.
- Varying dataset sizes (clusters serving data from as low as 500GB to as high as 2PB).
The talk will go into details of the core cost optimization algorithm that considers varying factors to recommend an optimal SKU.
- Multiple SKU Profiles
- Low-overhead mechanisms to collect high cardinality profiling data from production clusters
- Resource constraints (CPU, Memory, Disk IOPS, Throughout etc)
The system has been built with the goal of supporting "Multiple SKUs" effectively -- both in terms of cost optimization and keeping operational overhead minimum (fully automated). Through our talk, we will go into the details of all the infrastructure pieces we have built to deliver the solution in a generic fashion.
We will further discuss how this has been integrated this into our day-to-day operational machinery.

Jia is a Senior Software Engineer at LinkedIn, a committer for Apache Pinot. Jia focuses on making Pinot Fault-Tolerant and cost-effective. He has contributed across different areas of Pinot ranging from OLAP engine, indexing, fault tolerant shard placement to several performance improvements.

Sabrina is a Software Engineer at Netflix, focusing on relational datastores. Previously, she worked on OLAP system Pinot at LinkedIn and was a contributor for Apache Pinot where she had contributed features like SQL Pagination, availability improvements for massive multi-tenant clusters, OLAP SQL enhancements and fault-tolerant shard placement.

Dino is a Staff Software Engineer at LinkedIn and a contributor to Apache Pinot. Dino has been a strong SRE Leader for the Pinot team at LinkedIn. Dino has made many noteworthy contributions towards improving Pinot's operational excellence, resiliency, Site-Up, provisioning and usability posture. Dino has also contributed heavily towards making Pinot more reliable and performant.

author = {Jia Guo and Yifan (Sabrina) Zhao and Dino Occhialini},
title = {Fully Automated {HW} {SKU} Selection System to Optimize Apache {Pinot{\textquoteright}s} {Cost-to-Serve} at {LinkedIn}},
year = {2025},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}