Courtney Eckhardt, Heroku, a Salesforce company
Your site’s back up, you’re back in business. Do you have a way to make sure that problem doesn’t happen again? And if you do, do you like how it works?
Heroku uses a blameless retrospective process to understand and learn from our operational incidents. This tutorial will share the process we use and give you a chance to practice analyzing operational problems using the internal and external communications of a real Heroku operational incident. Along the way, we’ll discuss how Heroku developed this process, what issues we were trying to solve, and how we’re still iterating on it.
Courtney Eckhardt, Heroku, a Salesforce company
Courtney Eckhardt first got into retrospectives when she signed up for comp.risks as an undergrad (and since then, not as much has changed as we’d like to think). Her perspectives on engineering process improvement are strongly informed by the work of Kathy Sierra and Don Norman (among others).
author = {Courtney Eckhardt},
title = {Running Excellent Retrospectives: What Happened?},
year = {2018},
address = {Nashville, TN},
publisher = {USENIX Association},
month = oct
}
Engineers and engineering managers who want to bring an incident retrospective process to their org, or improve one they already have.
Attendees will have the materials and firsthand experience to advocate for (or to begin) an incident retrospective process at their workplace, or to improve a process they might already be using.
- Why run a retrospective
- Goal of a retrospective
- Blameless retrospectives
- How to structure a retrospective
- Preparing for a retrospective
- Five “why”s / infinite “how”s
- How to understand human error
While this is probably not suitable for extremely junior engineers, there are no specific pre-requisites.