Timezone: »
The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MultiFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.
Author Information
Marco Bellagente (Stability AI)
Manuel Brack (DFKI)
Hannah Teufel (Aleph Alpha)
Felix Friedrich (TU Darmstadt, hessian.AI)
Björn Deiseroth (Technische Universität Darmstadt, Aleph Alpha GmbH)
Constantin Eichenberg (Aleph Alpha)
Andrew Dai (Aleph Alpha)
Robert Baldock (Aleph Alpha)
I'm currently looking for my next position. If you might like to hire me or to collaborate, please reach out. Until a few months ago an AI Resident in Google Brain, Zurich, Robert was previously a Research Fellow at EPFL, working on Bayesian MCMC methods for statistical physics calculations. Robert also holds a PhD in the same topic from the University of Cambridge which you can find [here](https://link.springer.com/book/10.1007/978-3-319-66769-0). Robert is interested in better understanding, and advancing the state of the art of deep learning and deep RL. He has worked in Computer Vision and is curious to explore other areas as well (let's talk!)
Souradeep Nanda (UTD)
Koen Oostermeijer (Aleph Alpha)
Andres Felipe Cruz-Salinas (Aleph Alpha)
Patrick Schramowski (DFKI, Hessian.AI, TU Darmstadt)
Kristian Kersting (TU Darmstadt)
Samuel Weinbach (Aleph Alpha GmbH)
More from the Same Authors
-
2021 : Latent Space Refinement for Deep Generative Models »
Ramon Winterhalder · Marco Bellagente · Benjamin Nachman -
2021 : Latent Space Refinement for Deep Generative Models »
Ramon Winterhalder · Marco Bellagente · Benjamin Nachman -
2022 : Mixture of Gaussian Processes with Probabilistic Circuits for Multi-Output Regression »
Mingye Zhu · Zhongjie Yu · Martin Trapp · Arseny Skryagin · Kristian Kersting -
2023 : Quality-Diversity through AI Feedback »
Herbie Bradley · Andrew Dai · Hannah Teufel · Jenny Zhang · Koen Oostermeijer · Marco Bellagente · Jeff Clune · Kenneth Stanley · Grégory Schott · Joel Lehman -
2023 : Efficient Parallelization Layouts for Large-Scale Distributed Model Training »
Johannes Hagemann · Samuel Weinbach · Konstantin Dobler · Maximilian Schall · Gerard de Melo -
2023 : LEDITS++: Limitless Image Editing using Text-to-Image Models »
Manuel Brack · Linoy Tsban · Katharina Kornmeier · Apolinário Passos · Felix Friedrich · Patrick Schramowski · Kristian Kersting -
2023 : LEDITS++: Limitless Image Editing using Text-to-Image Models »
Manuel Brack · Linoy Tsban · Katharina Kornmeier · Apolinário Passos · Felix Friedrich · Patrick Schramowski · Kristian Kersting -
2023 : Leveraging Diffusion-Based Image Variations for Robust Training on Poisoned Data »
Lukas Struppek · Martin Bernhard Hentschel · Clifton Poth · Dominik Hintersdorf · Kristian Kersting -
2023 : Defending Our Privacy With Backdoors »
Dominik Hintersdorf · Lukas Struppek · Daniel Neider · Kristian Kersting -
2023 Poster: Do Not Marginalize Mechanisms, Rather Consolidate! »
Moritz Willig · Matej Zečević · Devendra Dhami · Kristian Kersting -
2023 Poster: Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction »
Quentin Delfosse · Hikaru Shindo · Devendra Dhami · Kristian Kersting -
2023 Poster: ATMAN: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation »
Björn Deiseroth · Mayukh Deb · Samuel Weinbach · Manuel Brack · Patrick Schramowski · Kristian Kersting -
2023 Poster: SEGA: Instructing Text-to-Image Models using Semantic Guidance »
Manuel Brack · Felix Friedrich · Dominik Hintersdorf · Lukas Struppek · Patrick Schramowski · Kristian Kersting -
2023 Poster: Characteristic Circuits »
Zhongjie Yu · Martin Trapp · Kristian Kersting -
2023 Poster: Holistic Evaluation of Text-to-Image Models »
Tony Lee · Michihiro Yasunaga · Chenlin Meng · Yifan Mai · Joon Sung Park · Agrim Gupta · Yunzhi Zhang · Deepak Narayanan · Hannah Teufel · Marco Bellagente · Minguk Kang · Taesung Park · Jure Leskovec · Jun-Yan Zhu · Fei-Fei Li · Jiajun Wu · Stefano Ermon · Percy Liang -
2023 Oral: Characteristic Circuits »
Zhongjie Yu · Martin Trapp · Kristian Kersting -
2022 : Panel »
Guy Van den Broeck · Cassio de Campos · Denis Maua · Kristian Kersting · Rianne van den Berg -
2022 Poster: LAION-5B: An open large-scale dataset for training next generation image-text models »
Christoph Schuhmann · Romain Beaumont · Richard Vencu · Cade Gordon · Ross Wightman · Mehdi Cherti · Theo Coombes · Aarush Katta · Clayton Mullis · Mitchell Wortsman · Patrick Schramowski · Srivatsa Kundurthy · Katherine Crowson · Ludwig Schmidt · Robert Kaczmarczyk · Jenia Jitsev -
2021 Poster: Interventional Sum-Product Networks: Causal Inference with Tractable Probabilistic Models »
Matej Zečević · Devendra Dhami · Athresh Karanam · Sriraam Natarajan · Kristian Kersting -
2021 Poster: Deep Learning Through the Lens of Example Difficulty »
Robert Baldock · Hartmut Maennel · Behnam Neyshabur -
2020 Poster: What Do Neural Networks Learn When Trained With Random Labels? »
Hartmut Maennel · Ibrahim Alabdulmohsin · Ilya Tolstikhin · Robert Baldock · Olivier Bousquet · Sylvain Gelly · Daniel Keysers -
2020 Spotlight: What Do Neural Networks Learn When Trained With Random Labels? »
Hartmut Maennel · Ibrahim Alabdulmohsin · Ilya Tolstikhin · Robert Baldock · Olivier Bousquet · Sylvain Gelly · Daniel Keysers