Laura de Vesine and Jamie Luck, Datadog Inc
Businesses run on metrics. They use them to judge success, identify areas for investment, and reward employees. Unfortunately, naive metrics can do more harm than good, especially in the context of low-frequency events like incidents. Management teams often reach for MTTR (mean time to recovery) or raw incident counts to judge the success of reliability and resilience programs, but these metrics generate spurious insights and perverse incentives. As SREs we can't simply tell the business not to measure them -- we need to offer alternatives. This talk explores a starting list of things to measure instead (and how to build your own list), as well as a framework to educate less technical people on what the actual value proposition of incident management is.

Laura de Vesine is a 20+ year software industry veteran. She has spent the last 9 years in SRE working in incident analysis and prevention, chaos engineering, and the intersection of technology and organizational culture, with a recent expansion into security. Laura is currently a senior staff engineer at Datadog, Inc. She also has a PhD in computer science, but mostly her cats nap on her diploma.

Jamie is a Senior SRE working in Incident Management at Datadog. Ever since they broke their first laptop and learned about this free operating system called Linux, it was all over. They have been working in the resilience and reliability space for ten years, operating everything from bare metal SPARC machines to fleets of containers. Passionate about sustainable computing, they focus their free time on repairing old machines and putting them back in service. In their current role, they define incident management and oncall practices for a mature engineering organization to complete the cycle of resilience from breakage to systemic improvement.

author = {Laura de Vesine and Jamie Luck},
title = {Incident Management Metrics That Matter},
year = {2025},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}