Zhaoyu Gao and Anubhavnidhi Abhashkumar, ByteDance; Zhen Sun, Cornell University; Weirong Jiang and Yi Wang, ByteDance
This paper presents the design, implementation, evaluation, and deployment of Crescent, ByteDance’s network emulation platform, for preventing change-induced network incidents. Inspired by prior art such as CrystalNet, Crescent achieves high fidelity by running switch vendor images inside containers. But, we explore a different route to scaling up the emulator with unique challenges. First, we analyze our past network incidents to reveal the difficulty in identifying a safe emulation boundary. Instead of emulating the entire network, we exploit the inherent symmetry and modularity of data center network architectures to strike a balance between coverage and resource cost. Second, we study the node-to-host assignment by formulating it as a graph partitioning problem. Evaluation results show that our partitioning algorithm reduces the testbed bootup time by up to 20× compared with random partitioning. Third, we developed an incremental approach to modify the emulated network on the fly. This approach can be 30× faster than creating a new testbed of the same scale. Crescent has been actively used for three and a half years, which led to a significant reduction in change-induced network incidents. We also share Crescent’s success in many other use cases and the critical lessons learned from its deployment.
NSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Zhaoyu Gao and Anubhavnidhi Abhashkumar and Zhen Sun and Weirong Jiang and Yi Wang},
title = {Crescent: Emulating Heterogeneous Production Network at Scale},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {1045--1062},
url = {https://www.usenix.org/conference/nsdi24/presentation/gao-zhaoyu},
publisher = {USENIX Association},
month = apr
}