Timezone: »
We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes. See https://github.com/dorarad/gansformer for model implementation.
Author Information
Dor Arad Hudson (Stanford University)
Larry Zitnick (Facebook AI Research)
More from the Same Authors
-
2022 : Fifteen-minute Competition Overview Video »
Abhishek Das · Zachary Ulissi · Siddharth Goyal · Nima Shoghi · Muhammed Shuaibi · Larry Zitnick · Janice Lan · Brook Wander · Brandon Wood · Anuroop Sriram · Ammar Rizvi · Aini Palizhati · Adeesh Kolluru -
2022 Competition: Open Catalyst Challenge »
Abhishek Das · Muhammed Shuaibi · Aini Palizhati · Siddharth Goyal · Adeesh Kolluru · Janice Lan · Ammar Rizvi · Nima Shoghi · Anuroop Sriram · Brook Wander · Brandon Wood · Zachary Ulissi · Larry Zitnick -
2022 Poster: Spherical Channels for Modeling Atomic Interactions »
Larry Zitnick · Abhishek Das · Adeesh Kolluru · Janice Lan · Muhammed Shuaibi · Anuroop Sriram · Zachary Ulissi · Brandon Wood -
2021 Workshop: CtrlGen: Controllable Generative Modeling in Language and Vision »
Steven Y. Feng · Dor Arad Hudson · Tatsunori Hashimoto · DONGYEOP Kang · Varun Prashant Gangal · Anusha Balakrishnan · Joel Tetreault -
2021 : Open Catalyst Challenge + Q&A »
Abhishek Das · Muhammed Shuaibi · Siddharth Goyal · Adeesh Kolluru · Janice Lan · Aini Palizhati · Anuroop Sriram · Brandon Wood · Aditya Grover · Devi Parikh · Zachary Ulissi · Larry Zitnick -
2020 : Contributed Talk - Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences »
Alexander Rives · Siddharth Goyal · Joshua Meier · Zeming Lin · Demi Guo · Myle Ott · Larry Zitnick · Rob Fergus -
2020 : Larry Zitnick - fastMRI »
Larry Zitnick -
2019 : fastMRI Challenge Talks »
Nafisa Yakubova · Nicola Pezzotti · Puyang Wang · Larry Zitnick · Dimitrios Karkalousos · Shanhui Sun · Matthan Caan · Tullie Murrell · Patrick Putzky -
2019 : Opening Remarks »
Florian Strub · Harm de Vries · Abhishek Das · Stefan Lee · Erik Wijmans · Dor Arad Hudson · Alane Suhr -
2019 Workshop: Visually Grounded Interaction and Language »
Florian Strub · Abhishek Das · Erik Wijmans · Harm de Vries · Stefan Lee · Alane Suhr · Dor Arad Hudson -
2017 Poster: ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games »
Yuandong Tian · Qucheng Gong · Wendy Shang · Yuxin Wu · Larry Zitnick -
2017 Oral: ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games »
Yuandong Tian · Qucheng Gong · Wendy Shang · Yuxin Wu · Larry Zitnick -
2016 : Gathering common sense knowledge: how to game it? »
Larry Zitnick