Wenxin Li, Xin He, Yuan Liu, and Keqiu Li, Tianjin University; Kai Chen, Hong Kong University of Science and Technology and University of Science and Technology of China; Zhao Ge and Zewei Guan, Tianjin University; Heng Qi, Dalian University of Technology; Song Zhang, Tianjin University; Guyue Liu, New York University Shanghai
Most existing data center network (DCN) flow scheduling solutions aim to minimize flow completion times (FCT). However, these solutions either require precise flow information (e.g., per-flow size), which is challenging to implement on commodity switches (e.g., pFabric), or no prior flow information at all, which is at the cost of performance (e.g., PIAS). In this work, we present QCLIMB, a new flow scheduling solution designed to minimize FCT by utilizing imprecise flow information. Our key observation is that although obtaining precise flow information can be challenging, it is possible to accurately estimate each flow's lower and upper bounds with machine learning techniques.
QCLIMB has two key parts: i) a novel scheduling algorithm that leverages the lower bounds of different flows to prioritize small flow over large flows from the beginning of transmission, rather than at later stages; and ii) an efficient out-of-order handling mechanism that addresses practical reordering issues resulting from the algorithm. We show that QCLIMB significantly outperforms PIAS (88% lower average FCT of small flows) and is surprisingly close to pFabric (around 9% gap) while not requiring any switch modifications.
NSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Wenxin Li and Xin He and Yuan Liu and Keqiu Li and Kai Chen and Zhao Ge and Zewei Guan and Heng Qi and Song Zhang and Guyue Liu},
title = {Flow Scheduling with Imprecise Knowledge},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {95--111},
url = {https://www.usenix.org/conference/nsdi24/presentation/li-wenxin},
publisher = {USENIX Association},
month = apr
}