Haopei Wang, Anubhavnidhi Abhashkumar, Changyu Lin, Tianrong Zhang, Xiaoming Gu, Ning Ma, Chang Wu, Songlin Liu, Wei Zhou, Yongbin Dong, Weirong Jiang, and Yi Wang, ByteDance Inc
In large-scale data center networks, answering network diagnosis queries from users still heavily rely on manual oncall services. A widespread scenario is when network users query whether any network issue is causing problems with their services/applications. However, this approach requires extensive experience and considerable efforts from network engineers who must repeatedly go through lots of monitoring dashboards and logs. It is notoriously slow, error-prone, and costly. We ask: is this the right solution, given the state of the art in network intelligence?
To answer, we first extensively study thousands of real network diagnosis cases and provide insights into how to address these issues more efficiently. Then we propose an AI enabled diagnosis framework and instantiate it in a task-oriented dialogue based diagnosis system, or colloquially, a chatbot, called NetAssistant. It accepts questions in natural language and performs proper diagnosis workflows in a timely manner. NetAssistant has been deployed and running in the data centers of our company for more than three years with hundreds of usages every day. We show it significantly decreases the number and duration of human involved oncalls. We share our experience on how to make it reliable and trustworthy and showcase how it helps solve real production issues efficiently.
NSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Haopei Wang and Anubhavnidhi Abhashkumar and Changyu Lin and Tianrong Zhang and Xiaoming Gu and Ning Ma and Chang Wu and Songlin Liu and Wei Zhou and Yongbin Dong and Weirong Jiang and Yi Wang},
title = {{NetAssistant}: Dialogue Based Network Diagnosis in Data Center Networks},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {2011--2024},
url = {https://www.usenix.org/conference/nsdi24/presentation/wang-haopei},
publisher = {USENIX Association},
month = apr
}