What Breaks Our Systems: A Taxonomy of Black Swans

Monday, March 25, 2019 - 9:00 am9:30 am

Laura Nolan, Slack

Abstract: 

Black swan events: unforeseen, unanticipated, and catastrophic issues. These are the incidents that take our systems down, hard, and keep them down for a long time.

By definition, you cannot predict true black swans. But black swans often fall into certain categories that we've seen before. This talk examines those categories and how we can harden our systems against these categories of events, which include unforeseen hard capacity limits, cascading failures, hidden system dependencies, and more.

Laura Nolan, Slack

Laura Nolan's background is in Site Reliability Engineering, software engineering, distributed systems, and computer science. She wrote the 'Managing Critical State' chapter in the O'Reilly 'Site Reliability Engineering' book, as well as contributing to the more recent 'Seeking SRE'.

SREcon19 Americas Open Access Videos Sponsored by
Salesforce

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {229525,
author = {Laura Nolan},
title = {What Breaks Our Systems: A Taxonomy of Black Swans},
year = {2019},
address = {Brooklyn, NY},
publisher = {USENIX Association},
month = mar
}

Presentation Video