Evolution of Incident Management at Slack

Note: Presentation times are in Coordinated Universal Time (UTC).

Thursday, 14 October, 2021 - 02:0002:30

D. Brent Chapman, Slack

Abstract: 

At Slack, we deliver over 150 million messages per minute at peak. Some fraction of those messages is us, managing incidents affecting the same platform that so many have come to rely on to manage their own incidents. How do we handle dozens of incidents a week, big and small, most of which our users are never aware of? Learn how we've made incident management a core capability of everyone on our engineering team: where we are, how we got here, and where we're going.

D. Brent Chapman, Slack

Brent Chapman is a Staff Engineer at Slack, with a focus on incident management. This involves building Slack's incident management capabilities; keeping incident management running smoothly day-to-day; helping the company learn from past incidents, and prevent and prepare for future incidents, and sharing Slack's incident management story with customers and the industry at large. Brent leads the training program for incident management processes throughout Slack Engineering, is the lead developer and instructor of Slack's classes for incident commanders, and is the co-developer and lead instructor of our classes for incident responders (which includes all engineers at Slack). He frequently coaches incident commanders throughout the company, at all levels, especially when they are new and growing into the role.

SREcon21 Open Access Sponsored by Indeed

BibTeX
@conference {276743,
author = {D. Brent Chapman},
title = {Evolution of Incident Management at Slack},
year = {2021},
publisher = {USENIX Association},
month = oct
}

Presentation Video