Chang Zhu, Arizona State University; Ziyang Li and Anton Xue, University of Pennsylvania; Ati Priya Bajaj, Wil Gibbs, and Yibo Liu, Arizona State University; Rajeev Alur, University of Pennsylvania; Tiffany Bao, Arizona State University; Hanjun Dai, Google; Adam Doupé, Arizona State University; Mayur Naik, University of Pennsylvania; Yan Shoshitaishvili and Ruoyu Wang, Arizona State University; Aravind Machiry, Purdue University
Binary type inference is a core research challenge in binary program analysis and reverse engineering. It concerns identifying the data types of registers and memory values in a stripped executable (or object file), whose type information is discarded during compilation. Current methods rely on either manually crafted inference rules, which are brittle and demand significant effort to update, or machine learning-based approaches that suffer from low accuracy.
In this paper we propose TYGR, a graph neural network based solution that encodes data-flow information for inferring both basic and struct variable types in stripped binary programs. To support different architectures and compiler optimizations, TYGR was implemented on top of the ANGR binary analysis platform and uses an architecture-agnostic data-flow analysis to extract a graph-based intra-procedural representation of data-flow information.
We noticed a severe lack of diversity in existing binary executables datasets and created TyDa, a large dataset of diverse binary executables. The sole publicly available dataset, provided by STATEFORMER, contains only 1% of the total number of functions in TyDa. TYGR is trained and evaluated on a subset of TyDa and generalizes to the rest of the dataset. TYGR demonstrates an overall accuracy of 76.6% and struct type accuracy of 45.2% on the x64 dataset across four optimization levels (O0-O3). TYGR outperforms existing works by a minimum of 26.1% in overall accuracy and 10.2% in struct accuracy.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Chang Zhu and Ziyang Li and Anton Xue and Ati Priya Bajaj and Wil Gibbs and Yibo Liu and Rajeev Alur and Tiffany Bao and Hanjun Dai and Adam Doup{\'e} and Mayur Naik and Yan Shoshitaishvili and Ruoyu Wang and Aravind Machiry},
title = {{TYGR}: Type Inference on Stripped Binaries using Graph Neural Networks},
booktitle = {33rd USENIX Security Symposium (USENIX Security 24)},
year = {2024},
isbn = {978-1-939133-44-1},
address = {Philadelphia, PA},
pages = {4283--4300},
url = {https://www.usenix.org/conference/usenixsecurity24/presentation/zhu-chang},
publisher = {USENIX Association},
month = aug
}