Timezone: »

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Marco Bellagente · Manuel Brack · Hannah Teufel · Felix Friedrich · Björn Deiseroth · Constantin Eichenberg · Andrew Dai · Robert Baldock · Souradeep Nanda · Koen Oostermeijer · Andres Felipe Cruz-Salinas · Patrick Schramowski · Kristian Kersting · Samuel Weinbach

Tue Dec 12 03:15 PM -- 05:15 PM (PST) @ Great Hall & Hall B1+B2 #612

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MultiFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

Author Information

Marco Bellagente (Stability AI)
Manuel Brack (DFKI)
Hannah Teufel (Aleph Alpha)
Felix Friedrich (TU Darmstadt, hessian.AI)
Björn Deiseroth (Technische Universität Darmstadt, Aleph Alpha GmbH)
Constantin Eichenberg (Aleph Alpha)
Andrew Dai (Aleph Alpha)
Robert Baldock (Aleph Alpha)

I'm currently looking for my next position. If you might like to hire me or to collaborate, please reach out. Until a few months ago an AI Resident in Google Brain, Zurich, Robert was previously a Research Fellow at EPFL, working on Bayesian MCMC methods for statistical physics calculations. Robert also holds a PhD in the same topic from the University of Cambridge which you can find [here](https://link.springer.com/book/10.1007/978-3-319-66769-0). Robert is interested in better understanding, and advancing the state of the art of deep learning and deep RL. He has worked in Computer Vision and is curious to explore other areas as well (let's talk!)

Souradeep Nanda (UTD)
Koen Oostermeijer (Aleph Alpha)
Andres Felipe Cruz-Salinas (Aleph Alpha)
Patrick Schramowski (DFKI, Hessian.AI, TU Darmstadt)
Kristian Kersting (TU Darmstadt)
Samuel Weinbach (Aleph Alpha GmbH)

More from the Same Authors