sponsors
usenix conference policies
A Working Theory-of-Monitoring
Caskey L. Dickson, Site Reliability Engineer, Google Inc.
At Google we have discovered many common pitfalls and false simplifications that cause frustration and blind-spots with monitoring systems. Internally we have our own home-grown monitoring systems, but to move beyond the hit-and-miss approach to monitoring we have developed a formal model for such systems. This model is used as a framework for developing, evaluating, and evolving monitoring systems at Google that are suitable for operating at scale.
We will present our model, show how existing open source solutions fit (and don't fit!) into that model, and invite attendees to contrast it with their experiences. The goal is to encourage a larger discussion into the theory of monitoring and how current solutions can be evolved into more effective tools for operators of large systems.
Caskey Dickson is a Site Reliability Engineer/Software Engineer at Google, where he works writing and maintaining monitoring services that operate at "Google scale." In online service development since 1995, before coming to Google he was a senior developer at Symantec, wrote software for various internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has an undergraduate degree in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
connect with us