Richard Clawson and Josh Gilliland, Microsoft Azure
One of the hardest things to do is trust an outside voice. What are the boundaries between live site features and service features? How much expertise is required to be on-call? Who decides what’s in the best interests of the service? How is this not another Ops team or a staff augment? Who’s "in charge" and who makes prioritization calls? How do you build mutual trust? These are just some of the challenges in building a successful partnership between a product group and SRE.
In this talk we will present what we learned about the technical, organizational, and political systems that were needed to provide SRE to the Azure Internet-of-Things product group and how this can be used as a template for your services. We will discuss how to start an engagement, build partnerships and trust across organizations, provide ROI, keep a distinct identity and the frameworks that were developed to maintain tight organizational alignment including a new take on error budgets.
Let’s continue the conversation!
Richard Clawson, Microsoft Azure
Richard Clawson is a Site Reliability Engineer working on the Azure SRE Team. He is part of the team in Azure that is working to improve operations across the Azure stack. Currently he is focused on creating repeatable patterns and practices for SRE engagements. Before Azure he was a software engineering manager on the Cortana speech platform and on the MSN publishing platform.
Josh Gilliland, Microsoft Azure
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Richard Clawson and Josh Gilliland},
title = {The Why, What, and How of Starting an {SRE} Engagement},
year = {2017},
address = {Dublin},
publisher = {USENIX Association},
month = aug
}