Kaicheng Yang and Yuanpeng Li, National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University and Peng Cheng Laboratory, Shenzhen, China; Sheng Long, National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University and Huawei Cloud Computing Technologies Co., Ltd., China; Tong Yang, National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University and Peng Cheng Laboratory, Shenzhen, China; Ruijie Miao and Yikai Zhao, National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University; Chaoyang Ji, Penghui Mi, Guodong Yang, Qiong Xie, Hao Wang, Yinhua Wang, Bo Deng, Zhiqiang Liao, Chengqiang Huang, Yongqiang Yang, Xiang Huang, Wei Sun, and Xiaoping Zhu, Huawei Cloud Computing Technologies Co., Ltd., China
Network faults occur frequently in the Internet. From the perspective of cloud service providers, network faults can be classified into three categories: cloud faults, client faults, and middle faults. This paper mainly focuses on middle faults. To minimize the harm of middle faults, we build a fully automatic system in Huawei Cloud, namely AAsclepius, which consists of a monitoring subsystem, a diagnosing subsystem, and a detouring subsystem. Through the collaboration of the three subsystems, AAsclepius monitors network faults, diagnoses network faults, and detours the traffic to circumvent middle faults at the Internet peering edge. The key innovation of AAsclepius is to identify the fault direction with a novel technique, namely PathDebugging. AAsclepius has been running for two years stably, protecting Huawei Cloud from major accidents in 2021 and 2022. Our evaluation on three major points of presence in December 2021 shows that AAsclepius can efficiently and safely detour the traffic to circumvent outbound faults within a few minutes.
USENIX ATC '23 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
This content is available to:
author = {Kaicheng Yang and Yuanpeng Li and Sheng Long and Tong Yang and Ruijie Miao and Yikai Zhao and Chaoyang Ji and Penghui Mi and Guodong Yang and Qiong Xie and Hao Wang and Yinhua Wang and Bo Deng and Zhiqiang Liao and Chengqiang Huang and Yongqiang Yang and Xiang Huang and Wei Sun and Xiaoping Zhu},
title = {{AAsclepius}: Monitoring, Diagnosing, and Detouring at the Internet Peering Edge},
booktitle = {2023 USENIX Annual Technical Conference (USENIX ATC 23)},
year = {2023},
isbn = {978-1-939133-35-9},
address = {Boston, MA},
pages = {655--671},
url = {https://www.usenix.org/conference/atc23/presentation/yang-kaicheng},
publisher = {USENIX Association},
month = jul
}