Learning from Learnings: Anatomy of Three Incidents

Wednesday, March 27, 2019 - 10:10 am10:40 am

Randy Shoup, WeWork

Abstract: 

The best response to a system outage is not "What did you do?", but "What did we learn?" This session will walk through three system-wide outages at Google, at Stitch Fix, and at WeWork—their incidents, aftermaths, and recoveries. In all cases, many things went right and a few went wrong; also in all cases, because of blameless cultures, we buckled down, learned a lot, and made substantial improvements in the systems for the future. Looking back with the perspective of 20-20 hindsight, all of these incidents were seminal events that changed the focus and trajectory of engineering at each organization. You will leave with a set of actionable suggestions in dealing with customers, engineering teams, and upper management. You will also enjoy a few war stories from the trenches, none of which has been previously told fully in public.

Randy Shoup, WeWork

Over the past several decades, Randy Shoup has led high-performing engineering teams at eBay, Google, Stitch Fix, and WeWork. A long-time advocate of DevOps practices, Randy specializes in scaling engineering organizations, company cultures, and technology infrastructures. He is equally excited to talk about empathy and learning cultures as discuss distributed systems, microservices, or containers.

SREcon19 Americas Open Access Videos Sponsored by
Salesforce

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {229545,
author = {Randy Shoup},
title = {Learning from Learnings: Anatomy of Three Incidents},
year = {2019},
address = {Brooklyn, NY},
publisher = {USENIX Association},
month = mar
}

Presentation Video