Cody Wilbourn, Parse.ly
On-call teams, postmortems, and costs of downtime are well-covered topics of DevOps. What's not spoken of is the costs of false alarms in your alerting. The team's ability to effectively handle true issues is hindered by this noise. What are these hidden costs, and how do you eliminate false alarms?
While you're at LISA17, how many monitoring emails do you expect to receive? 50? 100? How many of those need someone's intervention? Odds are you won't need to go off into a corner with your laptop to fix something critical on all of those emails.
Noisy monitoring system defaults and un-tuned alerts barrage us with information that isn't necessary. Those false alerts have a cost, even if it's not directly attributable to payroll. We'll walk through some of these costs, their dollar impacts on companies, and strategies to reduce the false alarms.
Cody Wilbourn, Parse.ly
Cody has been working in various operational roles for almost a decade, and been on call for most of that time. His background is in batch compute systems, having formerly managed storage and compute resources at Intel Austin for Atom processor design, and now helps provide realtime web analytics for some of the world's top news sites and publishers with Parse.ly. At Parse.ly, Cody reduced pager alerts by 85% and informational notifications by 70%, most of which were false alarms.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Cody Wilbourn},
title = {The Hidden Costs of {On-Call}: False Alarms},
year = {2017},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = oct
}