{PELICAN}: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis

Zhuo Zhang; Guanhong Tao; Guangyu Shen; Shengwei An; Qiuling Xu; Yingqi Liu; Yapeng Ye; Yaoxuan Wu; Xiangyu Zhang

Authors:

Zhuo Zhang, Guanhong Tao, Guangyu Shen, Shengwei An, Qiuling Xu, Yingqi Liu, and Yapeng Ye, Purdue University; Yaoxuan Wu, University of California, Los Angeles; Xiangyu Zhang, Purdue University

Abstract:

Deep Learning (DL) models are increasingly used in many cyber-security applications and achieve superior performance compared to traditional solutions. In this paper, we study backdoor vulnerabilities in naturally trained models used in binary analysis. These backdoors are not injected by attackers but rather products of defects in datasets and/or training processes. The attacker can exploit these vulnerabilities by injecting some small fixed input pattern (e.g., an instruction) called backdoor trigger to their input (e.g., a binary code snippet for a malware detection DL model) such that misclassification can be induced (e.g., the malware evades the detection). We focus on transformer models used in binary analysis. Given a model, we leverage a trigger inversion technique particularly designed for these models to derive trigger instructions that can induce misclassification. During attack, we utilize a novel trigger injection technique to insert the trigger instruction(s) to the input binary code snippet. The injection makes sure that the code snippets' original program semantics are preserved and the trigger becomes an integral part of such semantics and hence cannot be easily eliminated. We evaluate our prototype PELICAN on 5 binary analysis tasks and 15 models. The results show that PELICAN can effectively induce misclassification on all the evaluated models in both white-box and black-box scenarios. Our case studies demonstrate that PELICAN can exploit the backdoor vulnerabilities of two closed-source commercial tools.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {287348,
author = {Zhuo Zhang and Guanhong Tao and Guangyu Shen and Shengwei An and Qiuling Xu and Yingqi Liu and Yapeng Ye and Yaoxuan Wu and Xiangyu Zhang},
title = {{PELICAN}: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
isbn = {978-1-939133-37-3},
address = {Anaheim, CA},
pages = {2365--2382},
url = {https://www.usenix.org/conference/usenixsecurity23/presentation/zhang-zhuo-pelican},
publisher = {USENIX Association},
month = aug
}

Download

Zhang PDF

Zhang Paper (Prepublication) PDF

View the slides

PELICAN: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis

USENIX Security '23 is SOLD OUT.

Please do not plan to walk into the venue and register on site.
The event has reached maximum physical capacity, and we will not be able to accommodate any additional registrations.

Open Access Media

Presentation Video

PELICAN: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis

USENIX Security '23 is SOLD OUT.

Please do not plan to walk into the venue and register on site. The event has reached maximum physical capacity, and we will not be able to accommodate any additional registrations.

Open Access Media

Presentation Video

Please do not plan to walk into the venue and register on site.
The event has reached maximum physical capacity, and we will not be able to accommodate any additional registrations.