Dude, You Forgot the Feedback: How Your Open Loop Control Planes Are Causing Outages

Tuesday, 29 October, 2024 - 09:0009:45 GMT

Laura de Vesine, Datadog, Inc.

Abstract: 

It's a strong principle of good UX design that users should get feedback about the results of their actions, to help prevent errors. Experienced SREs know to build in additional observability to systems to watch our systems change as we mutate them, but these are typically out-of-band and require a conscious, deliberate action to observe -- so getting good feedback into our actions requires constant vigilance and training of new users. What if we instead built control planes that tell us exactly what we've done, and what effect that is having? This talk explores various patterns of "fire and forget" control planes in production systems, how each one contributes to outages, and some simple solutions to build better tools for operations.

Laura de Vesine, Datadog, Inc.

Laura de Vesine is a 20+ year software industry veteran. She has spent the last 8 years in SRE working in incident analysis and prevention, chaos engineering, and the intersection of technology and organizational culture. Laura is currently a staff engineer at Datadog, Inc. She also has a PhD in computer science, but mostly her cats nap on her diploma.

BibTeX
@conference {302197,
author = {Laura de Vesine},
title = {Dude, You Forgot the Feedback: How Your Open Loop Control Planes Are Causing Outages},
year = {2024},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}