sponsors
usenix conference policies
Fixing On-call, or How to Sleep Through the Night
Authors:
Matt Provost, Weta Digital
Abstract:
Monitoring systems are some of the most critical pieces of infrastructure for a systems administration team. They can also be a major cause of sleepless nights and lost weekends for the on-call sysadmin. This paper looks at a mature Nagios system that has been in continuous use for seven years with the same team of sysadmins. By 2012 it had grown into something that was causing significant disruption for the team and there was a major push to reform it into something more reasonable. We look at how a reduction in after hour alerts was achieved, together with an increase in overall reliability, and what lessons were learned from this effort.
Matt Provost, Weta Digital
Matt Provost is the Systems Manager at Weta Digital. Weta Digital is a five-time Academy Award–winning visual effects facility in Wellington, New Zealand. The Systems team at Weta is responsible for all of the company's servers, storage, and networking. They run a 49,000 core renderwall. Matt has been a system and network administrator for over 15 years. He has a B.A. from Indiana University, Bloomington.
Connect:
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
BibTeX
@inproceedings {179103,
author = {Matt Provost},
title = {Fixing On-call, or How to Sleep Through the Night},
booktitle = {27th Large Installation System Administration Conference (LISA 13)},
year = {2013},
isbn = {978-1-931971-05-8},
address = {Washington, D.C.},
pages = {7--16},
url = {https://www.usenix.org/conference/lisa13/technical-sessions/presentation/provost},
publisher = {USENIX Association},
month = nov
}
author = {Matt Provost},
title = {Fixing On-call, or How to Sleep Through the Night},
booktitle = {27th Large Installation System Administration Conference (LISA 13)},
year = {2013},
isbn = {978-1-931971-05-8},
address = {Washington, D.C.},
pages = {7--16},
url = {https://www.usenix.org/conference/lisa13/technical-sessions/presentation/provost},
publisher = {USENIX Association},
month = nov
}
connect with us