Centimani: Enabling Fast {AI} Accelerator Selection for {DNN} Training with a Novel Performance Predictor

Zhen Xie; Murali Emani; Xiaodong Yu; Dingwen Tao; Xin He; Pengfei Su; Keren Zhou; Venkatram Vishwanath

Authors:

Zhen Xie, Binghamton University; Murali Emani, Argonne National Laboratory; Xiaodong Yu, Stevens Institute of Technology; Dingwen Tao, Indiana University; Xin He, Xidian University; Pengfei Su, University of California, Merced; Keren Zhou, George Mason University; Venkatram Vishwanath, Argonne National Laboratory

Abstract:

For an extended period, graphics processing units (GPUs) have stood as the exclusive choice for training deep neural network (DNN) models. Over time, to serve the growing demands in a more targeted manner, various artificial intelligence-specific hardware, referred to as AI accelerators, have been vigorously developed, aiming to provide more efficient DNN acceleration solutions. However, sufficient solutions are also heterogeneous and thus introduce complexities in accelerator selection. Given a DNN model and a training objective, such as throughput or price-performance ratio, it remains challenging to arrive at the optimal decision among many options due to high reimplementation costs and unexpected performance.

To tackle this challenge, we propose Centimani, a performance predictor that accurately and rapidly predicts DNN training throughput on various AI accelerators, thereby facilitating the accelerator selection process. To achieve this goal, we first analyze typical AI accelerators and draw observations that abstract AI accelerator designs and guide our performance modeling approach. In particular, we construct a memory estimation model and decoupled performance models to select the most appropriate batch size and predict the execution time of DNN training. We validate our approach by applying Centimani to six common DNN models on four typical AI accelerators. Results show that Centimani predicts the throughput with an average accuracy of 93.1% on single-device training and 90.4% on multiple-device training, thus the optimal accelerator corresponding to the user's training objective can be obtained.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {298633,
author = {Zhen Xie and Murali Emani and Xiaodong Yu and Dingwen Tao and Xin He and Pengfei Su and Keren Zhou and Venkatram Vishwanath},
title = {Centimani: Enabling Fast {AI} Accelerator Selection for {DNN} Training with a Novel Performance Predictor},
booktitle = {2024 USENIX Annual Technical Conference (USENIX ATC 24)},
year = {2024},
isbn = {978-1-939133-41-0},
address = {Santa Clara, CA},
pages = {1203--1221},
url = {https://www.usenix.org/conference/atc24/presentation/xie},
publisher = {USENIX Association},
month = jul
}

Download

Xie PDF

Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor

Open Access Media

Presentation Video