An Exploration in Storing Telemetry in Cloud Object Storage

Thursday, 31 October, 2024 - 09:5010:30 GMT

Mike Heffner and Ray Jenkins, Streamfold

Abstract: 

Modern web application architectures require extensive telemetry data to function efficiently at scale. Traditional methods for collecting, storing, and processing this data have become increasingly expensive and challenging to maintain. Conversely, the prevalence of cloud object storage has given rise to the data lake. This has led some organizations to explore telemetry data lakes, which enable cost-efficient storage of large volumes of telemetry data.

We will explore various data storage formats used in constructing telemetry data lakes and discuss the tradeoffs associated with each approach. We will delve into common formats such as JSON, Parquet, ORC, and Apache Iceberg, examining how they can be utilized to store telemetry data like logs, metrics, and traces at scale. These formats will be empirically evaluated using real-world datasets. Additionally, we will review recent literature that highlights areas for design improvements in storage formats to better align them with modern computing hardware.

Mike Heffner, Streamfold

Mike Heffner is co-founder of Streamfold, where they are creating the first telemetry pipeline built for developers. Prior to Streamfold, Mike was a backend engineer at Netlify helping scale their delivery network, and at Librato building one of the first monitoring SaaS products. In his free time he takes advantage of all that the Blue Ridge Mountains have to offer.

Ray Jenkins, Streamfold

Ray Jenkins is co-founder of Streamfold, where they are creating the first telemetry pipeline built for developers. Prior to founding Streamfold, he led software engineering efforts at Snowflake, on the observability and performance of FoundationDB and at Segment on development of their stream processing pipeline, identity resolution system, and message delivery platforms.

BibTeX
@conference {302235,
author = {Mike Heffner and Ray Jenkins},
title = {An Exploration in Storing Telemetry in Cloud Object Storage},
year = {2024},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}