{LitePred}: Transferable and Scalable Latency Prediction for {Hardware-Aware} Neural Architecture Search

Chengquan Feng; Li Lyna Zhang; Yuanchi Liu; Jiahang Xu; Chengruidong Zhang; Zhiyuan Wang; Ting Cao; Mao Yang; Haisheng Tan

Authors:

Chengquan Feng, University of Science and Technology of China; Li Lyna Zhang, Microsoft Research; Yuanchi Liu, University of Science and Technology of China; Jiahang Xu and Chengruidong Zhang, Microsoft Research; Zhiyuan Wang, University of Science and Technology of China; Ting Cao and Mao Yang, Microsoft Research; Haisheng Tan, University of Science and Technology of China

Abstract:

Hardware-Aware Neural Architecture Search (NAS) has demonstrated success in automating the design of affordable deep neural networks (DNNs) for edge platforms by incorporating inference latency in the search process. However, accurately and efficiently predicting DNN inference latency on diverse edge platforms remains a significant challenge. Current approaches require several days to construct new latency predictors for each one platform, which is prohibitively time-consuming and impractical.

In this paper, we propose LitePred, a lightweight approach for accurately predicting DNN inference latency on new platforms with minimal adaptation data by transferring existing predictors. LitePred builds on two key techniques: (i) a Variational Autoencoder (VAE) data sampler to sample high-quality training and adaptation data that conforms to the model distributions in NAS search spaces, overcoming the out-of-distribution challenge; and (ii) a latency distribution-based similarity detection method to identify the most similar pre-existing latency predictors for the new target platform, reducing adaptation data required while achieving high prediction accuracy. Extensive experiments on 85 edge platforms and 6 NAS search spaces demonstrate the effectiveness of our approach, achieving an average latency prediction accuracy of 99.3% with less than an hour of adaptation cost. Compared with SOTA platform-specific methods, LitePred achieves up to 5.3% higher accuracy with a significant 50.6× reduction in profiling cost. Code and predictors are available at https://github.com/microsoft/Moonlit/tree/main/LitePred.

Chengquan Feng, University of Science and Technology of China

Li Lyna Zhang, Microsoft Research

Yuanchi Liu, University of Science and Technology of China

Jiahang Xu, Microsoft Research

Chengruidong Zhang, Microsoft Research

Zhiyuan Wang, University of Science and Technology of China

Ting Cao, Microsoft Research

Mao Yang, Microsoft Research

Haisheng Tan, University of Science and Technology of China

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {295655,
author = {Chengquan Feng and Li Lyna Zhang and Yuanchi Liu and Jiahang Xu and Chengruidong Zhang and Zhiyuan Wang and Ting Cao and Mao Yang and Haisheng Tan},
title = {{LitePred}: Transferable and Scalable Latency Prediction for {Hardware-Aware} Neural Architecture Search},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {1463--1477},
url = {https://www.usenix.org/conference/nsdi24/presentation/feng-chengquan},
publisher = {USENIX Association},
month = apr
}

Download