Clayton Krueger, USAA
In this session, we’ll explore how a financial services provider has developed a comprehensive, automated chaos engineering program, supported by strong leadership. While chaos testing is commonly done with individual applications, we’ve elevated the practice by applying it to an entire data center. This journey didn’t happen overnight, and we’ll take you through the key stages of our progress. We’ll discuss the major challenges we faced specifically around fear, uncertainty, and doubt. Attendees will gain insights into the tools and strategies we used to overcome obstacles and the lessons learned along the way. Additionally, we’ll share our plans for future efforts and how we aim to further enhance the robustness of our infrastructure. This session is perfect for anyone looking to deepen their understanding of large-scale chaos engineering in a complex environment.

Clayton Krueger is a trailblazing leader and founding member of the SRE team at USAA, where he has played a pivotal role in shaping the company’s infrastructure resiliency strategy. Clayton has been instrumental in designing and implementing USAA’s core metrics collection and storage frameworks that power the company’s SRE capabilities. Beyond infrastructure, he is driving transformative change in USAA’s problem and change management practices by spearheading automation initiatives that eliminate manual toil and enhance operational efficiency. Clayton is also committed to developing the next generation of elite technical troubleshooters, ensuring that USAA’s teams remain at the forefront of innovation and excellence.

author = {Clayton Krueger},
title = {Chaos Experiments - Datacenter Stress Testing},
year = {2025},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}