Poster

On improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models

Tariq Berrada Ifriqi ⋅ Pietro Astolfi ⋅ Melissa Hall ⋅ Reyhane Askari Hemmat ⋅ Yohann Benchetrit ⋅ Marton Havasi ⋅ Matthew Muckley ⋅ Karteek Alahari ⋅ Adriana Romero-Soriano ⋅ Jakob Verbeek ⋅ Michal Drozdzal

2024 Poster

[ Paper] [ Poster] [ OpenReview]

Abstract

Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, large-scale end-to-end training of these models is computationally costly, and hence most research focuses either on finetuning pretrained models or experiments at smaller scales.In this work we aim to improve the training efficiency and performance of LDMs with the goal of scaling to larger datasets and higher resolutions.We focus our study on two points that are critical for good performance and efficient training: (i) the mechanisms used for semantic level (\eg a text prompt, or class name) and low-level (crop size, random flip, \etc) conditioning of the model, and (ii) pre-training strategies to transfer representations learned on smaller and lower-resolution datasets to larger ones.The main contributions of our work are the following: we present systematic experimental study of these points, we propose a novel conditioning mechanism that disentangles semantic and low-level conditioning, we obtain state-of-the-art performance on CC12M for text-to-image at 512 resolution.

Video

Chat is not available.