Yufei Du, University of North Carolina at Chapel Hill; Ryan Court and Kevin Snow, Zeropoint Dynamics; Fabian Monrose, University of North Carolina at Chapel Hill
Identifying a binary’s compiler configuration enables developers and analysts to locate potential security issues caused by optimization side-effects, identify binary clones, and build compatible binary patches. Existing work focuses on identifying compiler family, version and optimization level of a binary using semantic features and deep learning techniques. Unfortunately, in practice, binaries are an amalgamation of objects and functions that can be compiled at different optimization levels with a variety of individual, fine-grained, optimizations that may be applied depending on the structure of the code. Hence, rather than recovering high-level artifacts, i.e., compiler family, version, and optimization level, we explore the recovery of individual, fine-grained, optimization passes for each function in a binary. To do so, we develop an approach using specially crafted features alongside intuitive and understandable machine learning models. Our evaluation on 15 popular open-source repositories shows that our approach compares favorable with the state-of-the-art deep learning approach in compiler family, compiler version and optimization level identification. For fine-grained optimization passes, our evaluation on 149,814 functions from 552 binaries in four popular open-source repositories shows that our approach achieves an average F-1 score of 92.1% for all optimization passes and an average F-1 score of 89.8% for optimization passes that could have negative impacts on security. Moreover, our approach includes experimental support for dynamic feature extraction via binary emulation, and our results shows that such features offer promising potential in improving the accuracy of optimization pass identification.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Yufei Du and Ryan Court and Kevin Snow and Fabian Monrose},
title = {Automatic Recovery of Fine-grained Compiler Artifacts at the Binary Level},
booktitle = {2022 USENIX Annual Technical Conference (USENIX ATC 22)},
year = {2022},
isbn = {978-1-939133-29-49},
address = {Carlsbad, CA},
pages = {853--868},
url = {https://www.usenix.org/conference/atc22/presentation/du},
publisher = {USENIX Association},
month = jul
}