Rob Sherwood, NetDebug.com; Jinghao Shi, Ying Zhang, Neil Spring, Srikanth Sundaresan, Jasmeet Bagga, Prathyusha Peddi, Vineela Kukkadapu, Rashmi Shrivastava, Manikantan KR, Pavan Patil, Srikrishna Gopu, Varun Varadan, Ethan Shi, Hany Morsy, Yuting Bu, Renjie Yang, Rasmus Jönsson, Wei Zhang, Jesus Jussepen Arredondo, and Diana Saha, Meta Platforms Inc.; Sean Choi, Santa Clara University
Network operators have long struggled to achieve reliability. Increased complexity risks surprising interactions, increased downtime, and lost person-hours trying to debug correctness and performance problems in large systems. For these reasons, network operators have also long pushed back on deploying promising network research, fearing the unexpected consequences of increased network complexity. Despite the changes’ potential benefits, the corresponding increase in complexity may result in a net loss.
The method to build reliability despite complexity in Software Engineering is testing. In this paper, we use statistics from a large-scale network to identify unique challenges in network testing. To tackle the challenges, we develop Netcastle: a system that provides continuous integration/continuous deployment (CI/CD) network testing as a service for 11 different networking teams, across 68 different use-cases, and O(1k) of test devices. Netcastle supports comprehensive network testing, including device-level firmware, datacenter distributed control planes, and backbone centralized controllers, and runs 500K+ network tests per day, a scale and depth of test coverage previously unpublished. We share five years of experiences in building and running Netcastle at Meta.
NSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Rob Sherwood and Jinghao Shi and Ying Zhang and Neil Spring and Srikanth Sundaresan and Jasmeet Bagga and Prathyusha Peddi and Vineela Kukkadapu and Rashmi Shrivastava and Manikantan KR and Pavan Patil and Srikrishna Gopu and Varun Varadan and Ethan Shi and Hany Morsy and Yuting Bu and Renjie Yang and Rasmus J{\"o}nsson and Wei Zhang and Jesus Jussepen Arredondo and Diana Saha and Sean Choi},
title = {Netcastle: Network Infrastructure Testing At Scale},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {993--10008},
url = {https://www.usenix.org/conference/nsdi24/presentation/sherwood},
publisher = {USENIX Association},
month = apr
}