Timezone: »
Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in model complexity and in the computational resources invested in training these models. Our work questions the underlying paradigm of compressing large training data into ever growing parametric representations. We rather present an orthogonal, semi-parametric approach. We complement comparably small diffusion or autoregressive models with a separate image database and a retrieval strategy. During training we retrieve a set of nearest neighbors from this external database for each training instance and condition the generative model on these informative samples. While the retrieval approach is providing the (local) content, the model is focusing on learning the composition of scenes based on this content. As demonstrated by our experiments, simply swapping the database for one with different contents transfers a trained model post-hoc to a novel domain. The evaluation shows competitive performance on tasks which the generative model has not been trained on, such as class-conditional synthesis, zero-shot stylization or text-to-image synthesis without requiring paired text-image data. With negligible memory and computational overhead for the external database and retrieval we can significantly reduce the parameter count of the generative model and still outperform the state-of-the-art.
Author Information
Andreas Blattmann (NVIDIA, LMU Munich, Heidelberg University)
Robin Rombach (Heidelberg University, LMU Munich)
Kaan Oktay (Ludwig-Maximilians-Universität München)
Jonas Müller (Ludwig-Maximilians-Universität München)
Björn Ommer (University of Munich)
More from the Same Authors
-
2021 Poster: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning »
Timo Milbich · Karsten Roth · Samarth Sinha · Ludwig Schmidt · Marzyeh Ghassemi · Bjorn Ommer -
2021 Poster: ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis »
Patrick Esser · Robin Rombach · Andreas Blattmann · Bjorn Ommer -
2020 : 16 - An Image is Worth 16 × 16 Tokens: Visual Priors for Efficient Image Synthesis with Transformers »
Robin Rombach -
2020 Poster: Network-to-Network Translation with Conditional Invertible Neural Networks »
Robin Rombach · Patrick Esser · Bjorn Ommer -
2020 Oral: Network-to-Network Translation with Conditional Invertible Neural Networks »
Robin Rombach · Patrick Esser · Bjorn Ommer -
2016 Poster: CliqueCNN: Deep Unsupervised Exemplar Learning »
Miguel A Bautista · Artsiom Sanakoyeu · Ekaterina Tikhoncheva · Bjorn Ommer -
2012 Poster: Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity »
Angela Eigenstetter · Bjorn Ommer