ACCL+: an FPGA-Based Collective Engine for Distributed Applications

Authors: 

Zhenhao He, Dario Korolija, Yu Zhu, and Benjamin Ramhorst, Systems Group, ETH Zurich; Tristan Laan, University of Amsterdam; Lucian Petrica and Michaela Blott, AMD Research; Gustavo Alonso, Systems Group, ETH Zurich

Abstract: 

FPGAs are increasingly prevalent in cloud deployments, serving as Smart-NICs or network-attached accelerators. To facilitate the development of distributed applications with FPGAs, in this paper we propose ACCL+, an open-source, FPGA-based collective communication library. Portable across different platforms and supporting UDP, TCP, as well as RDMA, ACCL+ empowers FPGA applications to initiate direct FPGA-to-FPGA collective communication. Additionally, it can serve as a collective offload engine for CPU applications, freeing the CPU from networking tasks. It is user-extensible, allowing new collectives to be implemented and deployed without having to re-synthesize the entire design. We evaluated ACCL+ on an FPGA cluster with 100 Gb/s networking, comparing its performance against software MPI over RDMA. The results demonstrate ACCL+'s significant advantages for FPGA-based distributed applications and its competitive performance for CPU applications. We showcase ACCL+'s dual role with two use cases: as a collective offload engine to distribute CPU-based vector-matrix multiplication, and as a component in designing fully FPGA-based distributed deep-learning recommendation inference.

OSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {298689,
author = {Zhenhao He and Dario Korolija and Yu Zhu and Benjamin Ramhorst and Tristan Laan and Lucian Petrica and Michaela Blott and Gustavo Alonso},
title = {{ACCL+}: an {FPGA-Based} Collective Engine for Distributed Applications},
booktitle = {18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)},
year = {2024},
isbn = {978-1-939133-40-3},
address = {Santa Clara, CA},
pages = {211--231},
url = {https://www.usenix.org/conference/osdi24/presentation/he},
publisher = {USENIX Association},
month = jul
}

Presentation Video