How a Single API Endpoint Saved Us 3000 CPU

Wednesday, 30 October, 2024 - 15:1015:30 GMT

Lasse Hels, Maersk

Abstract: 

How do you run a time series database exclusively on spot nodes? With great difficulty!

Grafana Mimir is the centrepiece of our observability platform at Maersk. For a long time, rollouts of Mimir's most crucial component would consistently trigger significant performance degradations in the platform. Getting to the root cause of the issue proved laborious and took us deep into the internals of Mimir.

Join us as we go through the issue postmortem and reflect on how to create consistency in a chaotic environment. The talk touches on topics such as CPU throttling, hash rings, compute utilisation analysis and metric series cardinality.

Lasse Hels, Maersk

Lasse is a software engineer at Maersk. As a member of the telemetry team, he took part in building the Maersk Observability Platform, and now spends much of his time keeping it running. Outside of work, his interests include speedrunning, powerlifting, etymology, and camels.

BibTeX
@conference {302189,
author = {Lasse Hels},
title = {How a Single {API} Endpoint Saved Us 3000 {CPU}},
year = {2024},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}