Transcend: Detecting Concept Drift in Malware Classification Models

Roberto Jordaney; Kumar Sharad; Santanu K. Dash; Zhi Wang; Davide Papini; Ilia Nouretdinov; Lorenzo Cavallaro

Authors:

Roberto Jordaney, Royal Holloway, University of London; Kumar Sharad, NEC Laboratories Europe; Santanu K. Dash, University College London; Zhi Wang, Nankai University; Davide Papini, Elettronica S.p.A.; Ilia Nouretdinov, and Lorenzo Cavallaro, Royal Holloway, University of London

Abstract:

Building machine learning models of malware behavior is widely accepted as a panacea towards effective malware classification. A crucial requirement for building sustainable learning models, though, is to train on a wide variety of malware samples. Unfortunately, malware evolves rapidly and it thus becomes hard—if not impossible—to generalize learning models to reflect future, previously-unseen behaviors. Consequently, most malware classifiers become unsustainable in the long run, becoming rapidly antiquated as malware continues to evolve. In this work, we propose Transcend, a framework to identify aging classification models in vivo during deployment, much before the machine learning model’s performance starts to degrade. This is a significant departure from conventional approaches that retrain aging models retrospectively when poor performance is observed. Our approach uses a statistical comparison of samples seen during deployment with those used to train the model, thereby building metrics for prediction quality. We show how Transcend can be used to identify concept drift based on two separate case studies on Android andWindows malware, raising a red flag before the model starts making consistently poor decisions due to out-of-date training.

Roberto Jordaney, Royal Holloway, University of London

Kumar Sharad, NEC Laboratories Europe

Santanu K. Dash, University College London

Zhi Wang, Nankai University

Davide Papini, Elettronica S.p.A

Ilia Nouretdinov, Royal Holloway, University of London

Lorenzo Cavallaro, Royal Holloway, University of London

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {203684,
author = {Roberto Jordaney and Kumar Sharad and Santanu K. Dash and Zhi Wang and Davide Papini and Ilia Nouretdinov and Lorenzo Cavallaro},
title = {Transcend: Detecting Concept Drift in Malware Classification Models},
booktitle = {26th USENIX Security Symposium (USENIX Security 17)},
year = {2017},
isbn = {978-1-931971-40-9},
address = {Vancouver, BC},
pages = {625--642},
url = {https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/jordaney},
publisher = {USENIX Association},
month = aug
}

Download