Zheng Wang, University of California, San Diego; Yuke Wang, Boyuan Feng, and Guyue Huang, University of California, Santa Barbara; Dheevatsa Mudigere and Bharath Muthiah, Meta; Ang Li, Pacific Northwest National Laboratory; Yufei Ding, University of California, San Diego
The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained manner, resulting in unbalanced workload distribution and inter-GPU communication.
To this end, we propose OPER, an algorithm-system co-design with OPtimality-guided Embedding table parallelization for large-scale Recommendation model training and inference. The core idea of OPER is to explore the connection between DLRM inputs and the efficiency of distributed EMTs, aiming to provide a near-optimal parallelization strategy for EMTs. Specifically, we conduct an in-depth analysis of various types of EMTs parallelism and propose a heuristic search algorithm to efficiently approximate an empirically near-optimal EMT parallelization. Furthermore, we implement a distributed shared memory-based system, which supports the lightweight but complex computation and communication pattern of fine-grained EMT parallelization, effectively converting theoretical improvements into real speedups. Extensive evaluation shows that OPER achieves 2.3× and 4.0× speedup on average in training and inference, respectively, over state-of-the-art DLRM frameworks.
USENIX ATC '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Zheng Wang and Yuke Wang and Boyuan Feng and Guyue Huang and Dheevatsa Mudigere and Bharath Muthiah and Ang Li and Yufei Ding},
title = {{OPER}: {Optimality-Guided} Embedding Table Parallelization for Large-scale Recommendation Model},
booktitle = {2024 USENIX Annual Technical Conference (USENIX ATC 24)},
year = {2024},
isbn = {978-1-939133-41-0},
address = {Santa Clara, CA},
pages = {667--682},
url = {https://www.usenix.org/conference/atc24/presentation/wang},
publisher = {USENIX Association},
month = jul
}