Theo Klein, Google
Want to prevent outages before they happen? Traditional SRE methods focus on component failures, but a whole class of outages stem from unexpected system interactions. We found a solution.
In our team, we use Systems Theoretic Process Analysis (STPA) to identify and fix system-level vulnerabilities before they cause outages. By applying STPA during the design phase, we've prevented major incidents and saved countless engineering hours.
This talk will show you how STPA can transform your approach to reliability. We'll share a real-world example where STPA caught critical design flaws that traditional methods missed, saving us months of costly rework.
Don't wait for outages to happen. Learn how STPA can help you build more resilient systems and become a 1000x engineer.

Theo Klein is a Senior Site Reliability Engineer working on Google Maps. Over the past year, he has lead an effort to improve the safety and reliability of road disruptions data on Google Maps. Previously, he lead efforts to remove unneeded dependencies on critical systems, which de-risked Google's many serving layers from global outages.
His primary interests are in systems thinking, dependency management and horizontal analyses of large-scale systems.

author = {Theo Klein},
title = {Mapping a Better Future with {STPA}},
year = {2025},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}