Daniel Bittman, Ethan L. Miller, and Peter Alvaro, UC Santa Cruz
Distributed systems are hard to reason about largely because of uncertainty about what may go wrong in a particular execution, and about whether the system will mitigate those faults. Tools that perturb executions can help test whether a system is robust to faults, while tools that observe executions can help better understand their system-wide effects. We present Box of Pain, a tracer and fault injector for unmodified distributed systems that addresses both concerns by interposing at the system call level and dynamically reconstructing the partial order of communication events based on causal relationships. Box of Pain’s lightweight approach to tracing and focus on simulating the effects of partial failures on communication rather than the failures themselves sets it apart from other tracing and fault injection systems. We present evidence of the promise of Box of Pain and its approach to lightweight observation and perturbation of distributed systems.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Daniel Bittman and Ethan L. Miller and Peter Alvaro},
title = {Co-evolving Tracing and Fault Injection with Box of Pain},
booktitle = {11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19)},
year = {2019},
address = {Renton, WA},
url = {https://www.usenix.org/conference/hotcloud19/presentation/bittman},
publisher = {USENIX Association},
month = jul
}