Jiaxu Zhao, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences; Beijing Key Laboratory of Network Security and Protection Technology; Yuekang Li, The University of New South Wales; Yanyan Zou, Zhaohui Liang, Yang Xiao, Yeting Li, Bingwei Peng, Nanyu Zhong, and Xinyi Wang, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences; Beijing Key Laboratory of Network Security and Protection Technology; Wei Wang, Institute of Information Engineering, Chinese Academy of Sciences; Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences; Beijing Key Laboratory of Network Security and Protection Technology; Wei Huo, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences; Beijing Key Laboratory of Network Security and Protection Technology
IoT devices have significantly impacted our daily lives, and detecting vulnerabilities in embedded systems early on is critical for ensuring their security. Among the existing vulnerability detection techniques for embedded systems, static taint analysis has been proven effective in detecting severe vulnerabilities, such as command injection vulnerabilities, which can cause remote code execution. Nevertheless, static taint analysis is faced with the problem of identifying sources comprehensively and accurately.
This paper presents Lara, a novel static taint analysis technique to detect vulnerabilities in embedded systems. The design of Lara is inspired by an observation that pertains to semantic relations within and between the code and data of embedded software: user input entries can be categorized as URIs or keys (data), and identifying their handling code (code) and relations can help systematically and comprehensively identify the sources for taint analysis. Transforming the observation into a practical methodology poses challenges. To address these challenges, Lara employs a combination of pattern-based static analysis and large language model(LLM)-aided analysis, aiming to replicate how human experts would utilize the findings during analysis and enhance it. The pattern-based static analysis simulates human experience, while the LLM-aided analysis captures the way human experts perceive code semantics. We implemented Lara and evaluated it on 203 IoT devices from 21 vendors. In general, Lara detects 556 and 602 more vulnerabilities than SaTC and Karonte while reducing false positives by 57.0% and 54.3%. Meanwhile, with more sources and sinks from Lara, EmTaint can detect 245 more vulnerabilities. To date, Lara has found 245 0-day vulnerabilities in 57 devices, all of which were confirmed or fixed with 162 CVE IDs assigned.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Jiaxu Zhao and Yuekang Li and Yanyan Zou and Zhaohui Liang and Yang Xiao and Yeting Li and Bingwei Peng and Nanyu Zhong and Xinyi Wang and Wei Wang and Wei Huo},
title = {Leveraging Semantic Relations in Code and Data to Enhance Taint Analysis of Embedded Systems},
booktitle = {33rd USENIX Security Symposium (USENIX Security 24)},
year = {2024},
isbn = {978-1-939133-44-1},
address = {Philadelphia, PA},
pages = {7067--7084},
url = {https://www.usenix.org/conference/usenixsecurity24/presentation/zhao},
publisher = {USENIX Association},
month = aug
}