Xuejing Yuan, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences. School of Cyber Security, University of Chinese Academy of Sciences.; Yuxuan Chen, Florida Institute of Technology; Yue Zhao, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences. School of Cyber Security, University of Chinese Academy of Sciences.; Yunhui Long, University of Illinois at Urbana-Champaign; Xiaokang Liu and Kai Chen, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences. School of Cyber Security, University of Chinese Academy of Sciences.; Shengzhi Zhang, Florida Institute of Technology. Department of Computer Science, Metropolitan College, Boston University, USA; Heqing Huang, unaffiliated; Xiaofeng Wang, Indiana University Bloomington; Carl A. Gunter, University of Illinois at Urbana-Champaign
The popularity of automatic speech recognition (ASR) systems, like Google Assistant, Cortana, brings in security concerns, as demonstrated by recent attacks. The impacts of such threats, however, are less clear, since they are either less stealthy (producing noise-like voice commands) or requiring the physical presence of an attack device (using ultrasound speakers or transducers). In this paper, we demonstrate that not only are more practical and surreptitious attacks feasible but they can even be automatically constructed. Specifically, we find that the voice commands can be stealthily embedded into songs, which, when played, can effectively control the target system through ASR without being noticed. For this purpose, we developed novel techniques that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener. Our research shows that this can be done automatically against real world ASR applications. We also demonstrate that such CommanderSongs can be spread through Internet (e.g., YouTube) and radio, potentially affecting millions of ASR users. Finally we present mitigation techniques that defend existing ASR systems against such threat.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Xuejing Yuan and Yuxuan Chen and Yue Zhao and Yunhui Long and Xiaokang Liu and Kai Chen and Shengzhi Zhang and Heqing Huang and XiaoFeng Wang and Carl A. Gunter},
title = {{CommanderSong}: A Systematic Approach for Practical Adversarial Voice Recognition},
booktitle = {27th USENIX Security Symposium (USENIX Security 18)},
year = {2018},
isbn = {978-1-939133-04-5},
address = {Baltimore, MD},
pages = {49--64},
url = {https://www.usenix.org/conference/usenixsecurity18/presentation/yuan-xuejing},
publisher = {USENIX Association},
month = aug
}