Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models

Authors: 

Shubham Agarwal and Subrata Mitra, Adobe Research; Sarthak Chakraborty, UIUC; Srikrishna Karanam, Koyel Mukherjee, and Shiv Kumar Saini, Adobe Research

Abstract: 

Text-to-image generation using diffusion models has seen explosive popularity owing to their ability in producing high quality images adhering to text prompts. However, diffusion-models go through a large number of iterative denoising steps, and are resource-intensive, requiring expensive GPUs and incurring considerable latency. In this paper, we introduce a novel approximate-caching technique that can reduce such iterative denoising steps by reusing intermediate noise states created during a prior image generation. Based on this idea, we present an end-to-end text-to-image generation system, NIRVANA, that uses approximate-caching with a novel cache management policy to provide 21% GPU compute savings, 19.8% end-to-end latency reduction, and 19% dollar savings on two real production workloads. We further present an extensive characterization of real production text-to-image prompts from the perspective of caching, popularity and reuse of intermediate states in a large production environment.

NSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {295597,
author = {Shubham Agarwal and Subrata Mitra and Sarthak Chakraborty and Srikrishna Karanam and Koyel Mukherjee and Shiv Kumar Saini},
title = {Approximate Caching for Efficiently Serving {Text-to-Image} Diffusion Models},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {1173--1189},
url = {https://www.usenix.org/conference/nsdi24/presentation/agarwal-shubham},
publisher = {USENIX Association},
month = apr
}