{LLM-Fuzzer}: Scaling Assessment of Large Language Model Jailbreaks

Jiahao Yu; Xingwei Lin; Zheng Yu; Xinyu Xing

Authors:

Jiahao Yu, Northwestern University; Xingwei Lin, Ant Group; Zheng Yu and Xinyu Xing, Northwestern University

Abstract:

The jailbreak threat poses a significant concern for Large Language Models (LLMs), primarily due to their potential to generate content at scale. If not properly controlled, LLMs can be exploited to produce undesirable outcomes, including the dissemination of misinformation, offensive content, and other forms of harmful or unethical behavior. To tackle this pressing issue, researchers and developers often rely on red-team efforts to manually create adversarial inputs and prompts designed to push LLMs into generating harmful, biased, or inappropriate content. However, this approach encounters serious scalability challenges.

To address these scalability issues, we introduce an automated solution for large-scale LLM jailbreak susceptibility assessment called LLM-Fuzzer. Inspired by fuzz testing, LLM-Fuzzer uses human-crafted jailbreak prompts as starting points. By employing carefully customized seed selection strategies and mutation mechanisms, LLM-Fuzzer generates additional jailbreak prompts tailored to specific LLMs. Our experiments show that LLM-Fuzzer-generated jailbreak prompts demonstrate significantly increased exploitability and transferability. This highlights that many open-source and commercial LLMs suffer from severe jailbreak issues, even after safety fine-tuning.

Jiahao Yu, Northwestern University

Xingwei Lin, Ant Group

Zheng Yu, Northwestern University

Xinyu Xing, Northwestern University

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {299691,
author = {Jiahao Yu and Xingwei Lin and Zheng Yu and Xinyu Xing},
title = {{LLM-Fuzzer}: Scaling Assessment of Large Language Model Jailbreaks},
booktitle = {33rd USENIX Security Symposium (USENIX Security 24)},
year = {2024},
isbn = {978-1-939133-44-1},
address = {Philadelphia, PA},
pages = {4657--4674},
url = {https://www.usenix.org/conference/usenixsecurity24/presentation/yu-jiahao},
publisher = {USENIX Association},
month = aug
}

Download

Yu PDF

View the slides

LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks

Jiahao Yu, Northwestern University

Xingwei Lin, Ant Group

Zheng Yu, Northwestern University

Xinyu Xing, Northwestern University

Open Access Media