“On-Call Is Ruining My Life” and Other Tales about Holding the Pager as an SRE

Tuesday, March 25, 2025 - 4:45 pm5:30 pm PDT

Cory Watson

Abstract: 

There’s no other part of SRE life that evokes such a strong reaction as being on-call. From the fear and anticipation of your first shift to the white-knuckle drama of a total system outage and the joy and satisfaction of debugging a particularly thorny issue - holding the pager is as much a human experience as a technical one. Let's talk about it!

We've done some surveys, pored over the literature, marinated in our experiences and have some findings. What models are in use? How do we feel about this work? What impact does it have? Can we do better? Will I get a pony? Ok, maybe not the last one.

I'll present some provocative findings that question the status quo around on-call and suggest some experiments you can take back and and test out. Maybe there will be a pony?

Cory Watson is an engineer and founder. Cory transitioned to a focus on reliability and observability as an early SRE at Twitter, founded the observability team at Stripe, and spent time at vendors SignalFx and Splunk. He is a strong voice in the observability community, through OSS, popular tweets, blog posts and speaking engagements.

Cory has over 20 years of software engineering experience, is an active founder / contributor of several successful Open Source projects. Before finding his passion in reliability, he worked in several industries such as e-commerce, consulting, healthcare, and fintech.

BibTeX
@conference {305509,
author = {Cory Watson},
title = {{{\textquotedblleft}On-Call} Is Ruining My {Life{\textquotedblright}} and Other Tales about Holding the Pager as an {SRE}},
year = {2025},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}