A large-scale deployment of DCTCP

Authors: 

Abhishek Dhamija and Balasubramanian Madhavan, Meta; Hechao Li, Netflix; Jie Meng, Shrikrishna Khare, and Madhavi Rao, Meta; Lawrence Brakmo; Neil Spring, Prashanth Kannan, and Srikanth Sundaresan, Meta; Soudeh Ghorbani, Meta and Johns Hopkins University

Abstract: 

This paper describes the process and operational experiences of deploying the Data Center TCP (DCTCP) protocol in a large-scale data center network. In contrast to legacy congestion control protocols that rely on loss as the primary signal of congestion, DCTCP signals in-network congestion (based on queue occupancy) to senders and adjusts the sending rate proportional to the level of congestion. At the time of our deployment, this protocol was well-studied and fairly established with proven efficiency gains in other networks. As expected, we also observed improved performance, and notably decreased packet losses, compared to legacy protocols in our data centers. Perhaps unexpectedly, however, we faced numerous hurdles in rolling out DCTCP; we chronicle these unexpected challenges, ranging from its unfairness (to other classes of traffic) to implementation bugs. We close by discussing some of the open research questions and challenges.

NSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {295495,
author = {Abhishek Dhamija and Balasubramanian Madhavan and Hechao Li and Jie Meng and Shrikrishna Khare and Madhavi Rao and Lawrence Brakmo and Neil Spring and Prashanth Kannan and Srikanth Sundaresan and Soudeh Ghorbani},
title = {A large-scale deployment of {DCTCP}},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {239--252},
url = {https://www.usenix.org/conference/nsdi24/presentation/dhamija},
publisher = {USENIX Association},
month = apr
}