Monitoring and Alerting

Wednesday, 30 October, 2024 - 14:00–15:30 GMT

Daria Barteneva, Microsoft Azure, and Niall Murphy, Stanza

Abstract: 

This session is an opportunity for people to come together and discuss monitoring and alerting, facilitated by our knowledgeable guides. This is not a prepared talk or workshop—expect a less-formal session with plenty of opportunity to ask questions and to talk to other attendees who are interested in monitoring and alerting.

Daria Barteneva, Microsoft Azure

Daria is a Principal Site Reliability Engineer in Observability Engineering in Azure. With a background in Applied Mathematics, Artificial Intelligence, and Music, Daria is passionate about machine learning, diversity in tech, and opera. In her current role, Daria is focused on changing organisational culture, processes, and platforms to improve service reliability and on-call experience. She has spoken at conferences on various aspects of reliability and human factors that play a key role in engineering practices, and has written for O'Reilly. Daria is originally from Moscow, Russia, having spent 20 years in Portugal, 10 years in Ireland, and now lives in the Pacific NorthWest.

Niall Murphy, Stanza

Niall is the CEO of Stanza Systems, has occupied various engineering and leadership roles in Microsoft, Google, and Amazon, and is the instigator of the best-selling & prize-winning Site Reliability Engineering, which he hopes at some stage to live down. His most recent book is Reliable Machine Learning, with Todd Underwood and many others.

BibTeX
@conference {304268,
author = {Daria Barteneva and Niall Murphy},
title = {Monitoring and Alerting},
year = {2024},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}