Timezone: »

Understanding and Mitigating Copying in Diffusion Models
Gowthami Somepalli · Vasu Singla · Micah Goldblum · Jonas Geiping · Tom Goldstein

Tue Dec 12 08:45 AM -- 10:45 AM (PST) @ Great Hall & Hall B1+B2 #1913

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role. In fact, we see in our experiments that data replication often does not happen for unconditional models, while it is common in the text-conditional case. Motivated by our findings, we then propose several techniques for reducing data replication at both training and inference time by randomizing and augmenting image captions in the training set. Code is available at https://github.com/somepago/DCR.

Author Information

Gowthami Somepalli (University of Maryland, College Park)
Vasu Singla (University of Maryland)

I am a 5th-year grad student at the University of Maryland, interested in the security and privacy of ML systems.

Micah Goldblum (New York University)
Jonas Geiping (ELLIS Institute & MPI Intelligent Systems, Tübingen AI Center)
Jonas Geiping

Jonas is a postdoctoral researcher at UMD. His background is in Mathematics, more specifically in mathematical optimization and its applications to deep learning. His current focus is on designing more secure and private ML systems, especially for federated learning, and on understanding fundamental phenomena behind generalization.

Tom Goldstein (University of Maryland)

More from the Same Authors