Timezone: »
Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user’s intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) generalizes to any generative architecture using classifier-free guidance. More importantly, it allows for subtle and extensive edits, composition and style changes, and optimizing the overall artistic conception. We demonstrate SEGA’s effectiveness on both latent and pixel-based diffusion models such as Stable Diffusion, Paella, and DeepFloyd-IF using a variety of tasks, thus providing strong evidence for its versatility and flexibility.
Author Information
Manuel Brack (DFKI)
Felix Friedrich (TU Darmstadt, hessian.AI)
Dominik Hintersdorf (TU Darmstadt)
Lukas Struppek (Technical University of Darmstadt)
Patrick Schramowski (DFKI, Hessian.AI, TU Darmstadt)
Kristian Kersting (TU Darmstadt)
More from the Same Authors
-
2022 : Mixture of Gaussian Processes with Probabilistic Circuits for Multi-Output Regression »
Mingye Zhu · Zhongjie Yu · Martin Trapp · Arseny Skryagin · Kristian Kersting -
2023 : LEDITS++: Limitless Image Editing using Text-to-Image Models »
Manuel Brack · Linoy Tsban · Katharina Kornmeier · Apolinário Passos · Felix Friedrich · Patrick Schramowski · Kristian Kersting -
2023 : LEDITS++: Limitless Image Editing using Text-to-Image Models »
Manuel Brack · Linoy Tsban · Katharina Kornmeier · Apolinário Passos · Felix Friedrich · Patrick Schramowski · Kristian Kersting -
2023 : Leveraging Diffusion-Based Image Variations for Robust Training on Poisoned Data »
Lukas Struppek · Martin Bernhard Hentschel · Clifton Poth · Dominik Hintersdorf · Kristian Kersting -
2023 : Defending Our Privacy With Backdoors »
Dominik Hintersdorf · Lukas Struppek · Daniel Neider · Kristian Kersting -
2023 Poster: Do Not Marginalize Mechanisms, Rather Consolidate! »
Moritz Willig · Matej Zečević · Devendra Dhami · Kristian Kersting -
2023 Poster: Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction »
Quentin Delfosse · Hikaru Shindo · Devendra Dhami · Kristian Kersting -
2023 Poster: ATMAN: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation »
Björn Deiseroth · Mayukh Deb · Samuel Weinbach · Manuel Brack · Patrick Schramowski · Kristian Kersting -
2023 Poster: MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation »
Marco Bellagente · Manuel Brack · Hannah Teufel · Felix Friedrich · Björn Deiseroth · Constantin Eichenberg · Andrew Dai · Robert Baldock · Souradeep Nanda · Koen Oostermeijer · Andres Felipe Cruz-Salinas · Patrick Schramowski · Kristian Kersting · Samuel Weinbach -
2023 Poster: Characteristic Circuits »
Zhongjie Yu · Martin Trapp · Kristian Kersting -
2023 Oral: Characteristic Circuits »
Zhongjie Yu · Martin Trapp · Kristian Kersting -
2022 : Panel »
Guy Van den Broeck · Cassio de Campos · Denis Maua · Kristian Kersting · Rianne van den Berg -
2022 Poster: LAION-5B: An open large-scale dataset for training next generation image-text models »
Christoph Schuhmann · Romain Beaumont · Richard Vencu · Cade Gordon · Ross Wightman · Mehdi Cherti · Theo Coombes · Aarush Katta · Clayton Mullis · Mitchell Wortsman · Patrick Schramowski · Srivatsa Kundurthy · Katherine Crowson · Ludwig Schmidt · Robert Kaczmarczyk · Jenia Jitsev -
2021 Poster: Interventional Sum-Product Networks: Causal Inference with Tractable Probabilistic Models »
Matej Zečević · Devendra Dhami · Athresh Karanam · Sriraam Natarajan · Kristian Kersting