Matthew Simons, Workiva
It's 4 AM and your phone is ringing. The system is in full meltdown and the company is losing money. Start the caffeine drip and get ready for a rough day.
We've all been there. When the world is on fire, Reliability Engineers are almost always on the front lines, extinguishers in hand. For the sake of our own sanity, what can we do to minimize the frequency and impact of production crises? How can we engineer the system to be more resilient, and what best practices can we employ to that end?
These are the questions we've been asking at Workiva as we've grown and scaled our operations from a small startup to a publicly-traded company in only a few years. We're still sane (mostly), so we'd like to share what's worked for us. Hopefully it will help you too.
Matthew Simons, Workiva
Matthew is a Reliability Engineer who works for a Top-10 tech company you've probably never heard of: Workiva (ranked #4 in 2016's Best Tech Companies to Work For). He grew up in the bay area and now resides in Ames, Iowa. He's an entrepreneur and a passionate driver of innovation, relentlessly pursuing higher levels of automation and process streamlining. He's also a woodworker, a chef, and a pc games enthusiast in his spare time.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Matthew Simons},
title = {It{\textquoteright}s the End of the World as We Know It (and I Feel Fine): Engineering for Crisis Response at Scale},
year = {2017},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = mar
}