Enabling Tensor Language Model to Assist in Generating {High-Performance} Tensor Programs for Deep Learning

Yi Zhai; Sijia Yang; Keyu Pan; Renwei Zhang; Shuo Liu; Chao Liu; Zichun Ye; Jianmin Ji; Jie Zhao; Yu Zhang; Yanyong Zhang

Authors:

Yi Zhai, University of Science and Technology of China; Sijia Yang, Huawei Technologies Co., Ltd.; Keyu Pan, ByteDance Ltd.; Renwei Zhang, Huawei Technologies Co., Ltd.; Shuo Liu, University of Science and Technology of China; Chao Liu and Zichun Ye, Huawei Technologies Co., Ltd.; Jianmin Ji, University of Science and Technology of China; Jie Zhao, Hunan University; Yu Zhang and Yanyong Zhang, University of Science and Technology of China

Abstract:

Obtaining high-performance tensor programs with high efficiency continues to be a substantial challenge. Approaches that favor efficiency typically limit their exploration space through heuristic constraints, which often lack generalizability. Conversely, approaches targeting high performance tend to create an expansive exploration space but employ ineffective exploration strategies.

We propose a tensor program generation framework for deep learning applications. Its core idea involves maintaining an expansive space to ensure high performance while performing powerful exploration with the help of language models to generate tensor programs efficiently. We thus transform the tensor program exploration task into a language model generation task. To facilitate this, we explicitly design the language model-friendly tensor language that records decision information to represent tensor programs. During the compilation of target workloads, the tensor language model (TLM) combines knowledge from offline learning and previously made decisions to probabilistically sample the best decision in the current decision space. This approach allows more informed space exploration than random sampling commonly used in previously proposed approaches.

Experimental results indicate that TLM excels in delivering both efficiency and performance. Compared to fully tuned Ansor/MetaSchedule, TLM matches their performance with a compilation speedup of 61×. Furthermore, when evaluated against Roller, with the same compilation time, TLM improves the performance by 2.25×. Code available at https://github.com/zhaiyi000/tlm.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {298697,
author = {Yi Zhai and Sijia Yang and Keyu Pan and Renwei Zhang and Shuo Liu and Chao Liu and Zichun Ye and Jianmin Ji and Jie Zhao and Yu Zhang and Yanyong Zhang},
title = {Enabling Tensor Language Model to Assist in Generating {High-Performance} Tensor Programs for Deep Learning},
booktitle = {18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)},
year = {2024},
isbn = {978-1-939133-40-3},
address = {Santa Clara, CA},
pages = {289--305},
url = {https://www.usenix.org/conference/osdi24/presentation/zhai},
publisher = {USENIX Association},
month = jul
}

Download

Zhai PDF

Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning

Open Access Media

Presentation Video