sponsors
usenix conference policies
You are here
How Netflix Embraces Failure to Improve Resilience and Maximize Availability
Ariel Tseitlin, Director, Cloud Solutions, Netflix
Netflix created a suite of tools, collectively called the Simian Army, to improve resiliency and maintain the cloud environment. In the typical case, failure modes are corner cases, which are poorly, if at all, tested. It is only by failing often that we can ensure that we are resilient to failure. We look for ways to induce failure in our production environment to better prepare us for the inevitable failures that will occur. This presentation will cover the motivation for inducing failure in production and the mechanics of how Netflix achieves it.
Ariel Tseitlin manages the Netflix Cloud and is interested in all things cloudy. At Netflix, he is Director of Cloud Solutions, helping Netflix be successful in the cloud, including cloud tooling, monitoring, performance and scalability, and cloud operations and reliability engineering. Ariel's team builds Asgard and the Simian Army, including the Chaos Monkey. Prior to Netflix, Ariel was VP of Technology and Products at Sungevity and before that was the Founder and CEO of CTOWorks.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
connect with us