Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models

Website Maintenance Alert

Due to scheduled maintenance, the USENIX website may not be available on Monday, March 17, from 10:00 am–6:00 pm Pacific Daylight Time (UTC -7). We apologize for the inconvenience and thank you for your patience.

If you would like to register for NSDI '25, SREcon25 Americas, or PEPR '25, please complete your registration before or after this time period.

Authors: 

Hongbin Liu, Michael K. Reiter, and Neil Zhenqiang Gong, Duke University

Abstract: 

Foundation model has become the backbone of the AI ecosystem. In particular, a foundation model can be used as a general-purpose feature extractor to build various downstream classifiers. However, foundation models are vulnerable to backdoor attacks and a backdoored foundation model is a single-point-of-failure of the AI ecosystem, e.g., multiple downstream classifiers inherit the backdoor vulnerabilities simultaneously. In this work, we propose Mudjacking, the first method to patch foundation models to remove backdoors. Specifically, given a misclassified trigger-embedded input detected after a backdoored foundation model is deployed, Mudjacking adjusts the parameters of the foundation model to remove the backdoor. We formulate patching a foundation model as an optimization problem and propose a gradient descent based method to solve it. We evaluate Mudjacking on both vision and language foundation models, eleven benchmark datasets, five existing backdoor attacks, and thirteen adaptive backdoor attacks. Our results show that Mudjacking can remove backdoor from a foundation model while maintaining its utility.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

Presentation Video