Timezone: »
How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting -- literally just filling in a hole in a concatenated visual prompt image -- turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated -- 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc. Project page: https://yossigandelsman.github.io/visual_prompt
Author Information
Amir Bar (TAU / UC Berkeley)

Amir Bar is a fourth-year Ph.D. candidate at Tel Aviv University and a Visiting Ph.D. Researcher at UC Berkeley, advised by Amir Globerson and Trevor Darrell. His primary research area centers around self-supervised learning and how to use large amounts of unlabeled images and videos to enable computers to develop visual understanding. Lately, his focus has been on improving learning algorithms for Masked Image Modeling and Visual Prompting, which involves adapting computer vision models during test time for novel computer vision tasks without changing the model weights or task-specific fine-tuning.
Yossi Gandelsman (UC Berkeley)
Trevor Darrell (Electrical Engineering & Computer Science Department)
Amir Globerson (Tel Aviv University, Google)
Alexei Efros (UC Berkeley)
More from the Same Authors
-
2021 : Benchmark for Compositional Text-to-Image Synthesis »
Dong Huk Park · Samaneh Azadi · Xihui Liu · Trevor Darrell · Anna Rohrbach -
2022 : Studying Bias in GANs through the Lens of Race »
Vongani Maluleke · Neerja Thakkar · Tim Brooks · Ethan Weber · Trevor Darrell · Alexei Efros · Angjoo Kanazawa · Devin Guillory -
2023 Poster: Diffusion Self-Guidance for Controllable Image Generation »
Dave Epstein · Allan Jabri · Ben Poole · Alexei Efros · Aleksander Holynski -
2023 Poster: Hierarchical Open-vocabulary Universal Image Segmentation »
Xudong Wang · Shufan Li · Konstantinos Kallidromitis · Yusuke Kato · Kazuki Kozuka · Trevor Darrell -
2023 Poster: Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence »
Grace Luo · Lisa Dunlap · Dong Huk Park · Aleksander Holynski · Trevor Darrell -
2023 Poster: Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives »
Tom Monnier · Jake Austin · Angjoo Kanazawa · Alexei Efros · Mathieu Aubry -
2023 Poster: Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation »
Lisa Dunlap · Alyssa Umino · Han Zhang · Jiezhi Yang · Joseph Gonzalez · Trevor Darrell -
2023 Poster: Language Models are Visual Reasoning Coordinators »
Liangyu Chen · Bo Li · Sheng Shen · Jingkang Yang · Chunyuan Li · Kurt Keutzer · Trevor Darrell · Ziwei Liu -
2022 Poster: K-LITE: Learning Transferable Visual Models with External Knowledge »
Sheng Shen · Chunyuan Li · Xiaowei Hu · Yujia Xie · Jianwei Yang · Pengchuan Zhang · Zhe Gan · Lijuan Wang · Lu Yuan · Ce Liu · Kurt Keutzer · Trevor Darrell · Anna Rohrbach · Jianfeng Gao -
2022 Poster: Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens »
Elad Ben Avraham · Roei Herzig · Karttikeya Mangalam · Amir Bar · Anna Rohrbach · Leonid Karlinsky · Trevor Darrell · Amir Globerson -
2022 Poster: Test-Time Training with Masked Autoencoders »
Yossi Gandelsman · Yu Sun · Xinlei Chen · Alexei Efros -
2022 Poster: Generating Long Videos of Dynamic Scenes »
Tim Brooks · Janne Hellsten · Miika Aittala · Ting-Chun Wang · Timo Aila · Jaakko Lehtinen · Ming-Yu Liu · Alexei Efros · Tero Karras -
2021 Poster: A Theoretical Analysis of Fine-tuning with Linear Teachers »
Gal Shachaf · Alon Brutzkus · Amir Globerson -
2021 Poster: CLIP-It! Language-Guided Video Summarization »
Medhini Narasimhan · Anna Rohrbach · Trevor Darrell -
2021 Poster: Early Convolutions Help Transformers See Better »
Tete Xiao · Mannat Singh · Eric Mintun · Trevor Darrell · Piotr Dollar · Ross Girshick -
2021 Poster: Teachable Reinforcement Learning via Advice Distillation »
Olivia Watkins · Abhishek Gupta · Trevor Darrell · Pieter Abbeel · Jacob Andreas -
2021 Poster: MarioNette: Self-Supervised Sprite Learning »
Dmitriy Smirnov · MICHAEL GHARBI · Matthew Fisher · Vitor Guizilini · Alexei Efros · Justin Solomon -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 : QA: Alexei Efros »
Alexei Efros -
2020 : Invited Talk: Alexei Efros »
Alexei Efros -
2020 Poster: Space-Time Correspondence as a Contrastive Random Walk »
Allan Jabri · Andrew Owens · Alexei Efros -
2020 Oral: Space-Time Correspondence as a Contrastive Random Walk »
Allan Jabri · Andrew Owens · Alexei Efros -
2020 Poster: Swapping Autoencoder for Deep Image Manipulation »
Taesung Park · Jun-Yan Zhu · Oliver Wang · Jingwan Lu · Eli Shechtman · Alexei Efros · Richard Zhang -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 : Oral Presentations »
Janith Petangoda · Sergio Pascual-Diaz · Jordi Grau-Moya · Raphaël Marinier · Olivier Pietquin · Alexei Efros · Phillip Isola · Trevor Darrell · Christopher Lu · Deepak Pathak · Johan Ferret -
2019 Poster: Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity »
Deepak Pathak · Christopher Lu · Trevor Darrell · Phillip Isola · Alexei Efros -
2019 Spotlight: Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity »
Deepak Pathak · Christopher Lu · Trevor Darrell · Phillip Isola · Alexei Efros -
2017 : How to stop worrying and learn to love Nearest Neighbors »
Alexei Efros -
2017 Poster: Robust Conditional Probabilities »
Yoav Wald · Amir Globerson -
2017 Poster: Toward Multimodal Image-to-Image Translation »
Jun-Yan Zhu · Richard Zhang · Deepak Pathak · Trevor Darrell · Alexei Efros · Oliver Wang · Eli Shechtman -
2016 : What makes ImageNet good for Transfer Learning? »
Jacob MY Huh · Pulkit Agrawal · Alexei Efros