Devil’s Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices

Authors: 

Yuxuan Chen, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Department of Computer Science, Florida Institute of Technology; Xuejing Yuan, Jiangshan Zhang, and Yue Zhao, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Shengzhi Zhang, Department of Computer Science, Metropolitan College, Boston University, USA; Kai Chen, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; XiaoFeng Wang, School of Informatics and Computing, Indiana University Bloomington

Abstract: 

Recently studies show that adversarial examples (AEs) can pose a serious threat to a “white-box” automatic speech recognition (ASR) system, when its machine-learning model is exposed to the adversary. Less clear is how realistic such a threat would be towards commercial devices, such as Google Home, Cortana, Echo, etc., whose models are not publicly available. Exploiting the learning model behind ASR system in black-box is challenging, due to the presence of complicated preprocessing and feature extraction even before the AEs could reach the model. Our research, however, shows that such a black-box attack is realistic. In the paper, we present Devil’s Whisper, a general adversarial attack on commercial ASR systems. Our idea is to enhance a simple local model roughly approximating the target black-box platform with a white-box model that is more advanced yet unrelated to the target. We find that these two models can effectively complement each other in predicting the target’s behavior, which enables highly transferable and generic attacks on the target. Using a novel optimization technique, we show that a local model built upon just over 1500 queries can be elevated by the open-source Kaldi Aspire Chain Model to effectively exploit commercial devices (Google Assistant, Google Home, Amazon Echo and Microsoft Cortana). For 98% of the target commands of these devices, our approach can generate at least one AE for attacking the target devices.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {247642,
author = {Yuxuan Chen and Xuejing Yuan and Jiangshan Zhang and Yue Zhao and Shengzhi Zhang and Kai Chen and XiaoFeng Wang},
title = {{Devil{\textquoteright}s} Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices},
booktitle = {29th USENIX Security Symposium (USENIX Security 20)},
year = {2020},
isbn = {978-1-939133-17-5},
pages = {2667--2684},
url = {https://www.usenix.org/conference/usenixsecurity20/presentation/chen-yuxuan},
publisher = {USENIX Association},
month = aug
}

Presentation Video