Skip to yearly menu bar Skip to main content


Poster

How Diffusion Models Learn to Factorize and Compose

Qiyao Liang · Ziming Liu · Mitchell Ostrow · Ila Fiete


Abstract:

Diffusion models are capable of generating photo-realistic images that combine elements which do not appear together in natural images, demonstrating their ability to compositionally generalize. Nonetheless, the precise mechanism of compositionality and how it is acquired through training remains elusive. Here, we consider a highly reduced setting to examine whether diffusion models learn semantically meaningful and fully factorized representations of composable features. We performed extensive controlled experiments on conditional DDPMs trained to generate various forms of 2D Gaussian data. We demonstrate that the models learn factorized, semi-continuous manifold representations that are orthogonal in underlying continuous latent features of independent variations but are not aligned for different values of the same feature. With such representations, models demonstrate superior compositionality but have limited ability to interpolate over unseen values of a given feature. Our experimental results further demonstrate that diffusion models can attain compositionality with a small amount of compositional examples, suggesting a novel way to train DDPMs. Finally, we connect manifold formation in diffusion models to percolation theory in physics, thereby offering insights into the sudden onset of factorized representation learning. Our thorough toy experiments thus contribute a deeper understanding of how diffusion models capture compositional structure in data, paving the way for future research aimed at enhancing factorization and compositional generalization in generative models for real-world applications.

Live content is unavailable. Log in and register to view live content