A Tale of Two Postmortems: A Human Factors View

Wednesday, June 12, 2019 - 9:10 am9:55 am

Tanner Lund, Microsoft

Abstract: 

Many companies become frustrated with their postmortem and incident review process, feeling that it is a burden, or that it does not provide meaningful insights, or that the repairs and learnings generated do not help prevent repeats or other incidents. Fortunately, there is a better way to do things, backed by decades of scientific rigor and proven in industries where outages can mean a lot worse than lost revenue.

Join our fictional company, "Potato Systems‚" as they deal with the aftermath of a catastrophic incident. As they struggle to learn from it and move forward, they—and we—will come to understand the stark contrast in outcomes and effectiveness of Safety I vs Safety II thinking.

Tanner Lund, Microsoft

Tanner Lund has been a part of Azure's SRE organization from the beginning. He has worked in a variety of roles, including crisis management, developing SREBot, building data pipelines, and leading services through SRE/DevOps transitions. Throughout it all his focus has been on understanding complex systems and how we achieve our goals through them, seeking to unlock their secrets.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {233259,
author = {Tanner Lund},
title = {A Tale of Two Postmortems: A Human Factors View},
year = {2019},
address = {Singapore},
publisher = {USENIX Association},
month = jun
}

Presentation Video