Fault Tree Analysis Applied to Apache Kafka

Andrey Falko

Friday, 4 October, 2019 - 16:00–16:45

Andrey Falko, Lyft

Abstract:

This talk should provide a framework for answers the following common questions a Kafka operator or user might have: What should your replication factor be for your Kafka topics? How many partitions should you have? How many consumers should I provision? What should my ISR setting be? Should I use RAID or not?

Andrey Falko, Lyft

Andrey Falko is one of the first Reliability Software Engineers at hired at Lyft, where he has been for more than a year. He is currently focused on building and scaling reliable PubSub systems for Lyft's Data Platform. Prior to Lyft, Andrey worked at Salesforce for nine years where he researched Kafka and Pulsar performance and reliability. While there, he also built an IaaS system, many CI/CD systems, a Zipkin service, and features for the Salesforce platform.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

View the slides

Fault Tree Analysis Applied to Apache Kafka

Website Maintenance Alert

Andrey Falko, Lyft

Open Access Media

Presentation Video