Learning from Incidents at Scale; Actually Doing Cross-Incident Analysis

Wednesday, March 26, 2025 - 11:00 am11:45 am PDT

Vanessa Huerta Granda, Enova

Abstract: 

For a few years we have discussed this idea of Learning from Incidents that encourages folks to deeply understand an incident through a thorough, in-depth investigation of how it came to be. I personally have led these investigations, written about them, and coached folks on them and while I stand by this process I have also seen how difficult it is to scale this process.

In this talk I will describe how my team (resiliency engineering) has been able to leverage our incident review program to learn from incidents at scale. How we’ve been able to analyze a universe of incidents broken out into quarters, years, products, and technologies and gain insights and make recommendations to improve our sociotechnical systems.

Vanessa is a Technology Manager for Resilience Engineering at Enova. Previously she worked at Jeli.io helping companies make the most of their incidents and has spent the last decade focusing on Production Incident processes, learning from incidents, and handling Major Incidents as Incident Commander. She has spoken and written on incident metrics, sharing learnings, and in 2021 co-authored Jeli’s Howie: The Post-Incident Guide. She is passionate about continuous improvement, getting teams to talk to each other, and sharing incident findings.

BibTeX
@conference {305513,
author = {Vanessa Huerta Granda},
title = {Learning from Incidents at Scale; Actually Doing {Cross-Incident} Analysis},
year = {2025},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}