Extracting Training Data from Diffusion Models

Nicolas Carlini; Jamie Hayes; Milad Nasr; Matthew Jagielski; Vikash Sehwag; Florian Tramèr; Borja Balle; Daphne Ippolito; Eric Wallace

Authors:

Nicholas Carlini, Google; Jamie Hayes, DeepMind; Milad Nasr and Matthew Jagielski, Google; Vikash Sehwag, Princeton University; Florian Tramèr, ETH Zurich; Borja Balle, DeepMind; Daphne Ippolito, Google; Eric Wallace, UC Berkeley

Abstract:

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

Nicholas Carlini is a research scientist at Google Brain. He analyzes the security and privacy of machine learning, for which he has received best paper awards at IEEE S&P and ICML. He graduated with his PhD from the the University of California, Berkeley in 2018.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {291199,
author = {Nicolas Carlini and Jamie Hayes and Milad Nasr and Matthew Jagielski and Vikash Sehwag and Florian Tram{\`e}r and Borja Balle and Daphne Ippolito and Eric Wallace},
title = {Extracting Training Data from Diffusion Models},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
isbn = {978-1-939133-37-3},
address = {Anaheim, CA},
pages = {5253--5270},
url = {https://www.usenix.org/conference/usenixsecurity23/presentation/carlini},
publisher = {USENIX Association},
month = aug
}

Download

Carlini PDF

Extracting Training Data from Diffusion Models

USENIX Security '23 is SOLD OUT.

Please do not plan to walk into the venue and register on site.
The event has reached maximum physical capacity, and we will not be able to accommodate any additional registrations.

Open Access Media

Presentation Video

Extracting Training Data from Diffusion Models

USENIX Security '23 is SOLD OUT.

Please do not plan to walk into the venue and register on site. The event has reached maximum physical capacity, and we will not be able to accommodate any additional registrations.

Open Access Media

Presentation Video

Please do not plan to walk into the venue and register on site.
The event has reached maximum physical capacity, and we will not be able to accommodate any additional registrations.