Timezone: »
We present a scalable approach for learning open-world object-goal navigation (ObjectNav) – the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e.g., “find a sink”). Our approach is entirely zero-shot – i.e., it does not require ObjectNav rewards or demonstrations of any kind. Instead, we train on the image-goal navigation (ImageNav) task, in which agents find the location where a picture (i.e., goal image) was captured. Specifically, we encode goal images into a multimodal, semantic embedding space to enable training semantic-goal navigation (SemanticNav) agents at scale in unannotated 3D environments (e.g., HM3D). After training, SemanticNav agents can be instructed to find objects described in free-form natural language (e.g., “sink,” “bathroom sink,” etc.) by projecting language goals into the same multimodal, semantic embedding space. As a result, our approach enables open-world ObjectNav. We extensively evaluate our agents on three ObjectNav datasets (Gibson, HM3D, and MP3D) and observe absolute improvements in success of 4.2% - 20.0% over existing zero-shot methods. For reference, these gains are similar or better than the 5% improvement in success between the Habitat 2020 and 2021 ObjectNav challenge winners. In an open-world setting, we discover that our agents can generalize to compound instructions with a room explicitly mentioned (e.g., “Find a kitchen sink”) and when the target room can be inferred (e.g., “Find a sink and a stove”).
Author Information
Arjun Majumdar (Georgia Institute of Technology)
Gunjan Aggarwal (Georgia Tech)
Bhavika Devnani (Georgia Institute of Technology)
Judy Hoffman (Georgia Institute of Technology)
Dhruv Batra (FAIR (Meta) / Georgia Tech)
More from the Same Authors
-
2021 Spotlight: Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · John Turner · Noah Maestre · Mustafa Mukadam · Devendra Singh Chaplot · Oleksandr Maksymets · Aaron Gokaslan · Vladimír Vondruš · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI »
Santhosh Kumar Ramakrishnan · Aaron Gokaslan · Erik Wijmans · Oleksandr Maksymets · Alexander Clegg · John Turner · Eric Undersander · Wojciech Galuba · Andrew Westbury · Angel Chang · Manolis Savva · Yili Zhao · Dhruv Batra -
2022 : Fifteen-minute Competition Overview Video »
Dhruv Batra · Manolis Savva · Zsolt Kira · Vincent-Pierre Berges · Karmesh Yadav · Angel Chang · Andrew Szot · Alexander Clegg · Aaron Gokaslan -
2023 Poster: LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images »
Viraj Prabhu · Sriram Yenamandra · Prithvijit Chattopadhyay · Judy Hoffman -
2023 Poster: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? »
Arjun Majumdar · Karmesh Yadav · Sergio Arnaud · Jason Yecheng Ma · Claire Chen · Sneha Silwal · Aryan Jain · Vincent-Pierre Berges · Tingfan Wu · Jay Vakil · Pieter Abbeel · Jitendra Malik · Dhruv Batra · Yixin Lin · Oleksandr Maksymets · Aravind Rajeswaran · Franziska Meier -
2023 Poster: Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks »
Micah Goldblum · Hossein Souri · Renkun Ni · Manli Shu · Viraj Prabhu · Gowthami Somepalli · Prithvijit Chattopadhyay · Adrien Bardes · Mark Ibrahim · Judy Hoffman · Rama Chellappa · Andrew Wilson · Tom Goldstein -
2023 Competition: The HomeRobot Open Vocabulary Mobile Manipulation Challenge »
Sriram Yenamandra · Arun Ramachandran · Mukul Khanna · Karmesh Yadav · Devendra Singh Chaplot · Gunjan Chhablani · Alexander Clegg · Theophile Gervet · Vidhi Jain · Ruslan Partsey · Ram Ramrakhya · Andrew Szot · Austin Wang · Tsung-Yen Yang · Aaron Edsinger · Charles Kemp · Binit Shah · Zsolt Kira · Dhruv Batra · Roozbeh Mottaghi · Yonatan Bisk · Chris Paxton -
2022 : Bi-Directional Self-Attention for Vision Transformers »
George Stoica · Taylor Hearn · Bhavika Devnani · Judy Hoffman -
2022 Competition: Habitat Rearrangement Challenge »
Andrew Szot · Karmesh Yadav · Alexander Clegg · Vincent-Pierre Berges · Aaron Gokaslan · Angel Chang · Manolis Savva · Zsolt Kira · Dhruv Batra -
2022 Poster: VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement »
Erik Wijmans · Irfan Essa · Dhruv Batra -
2022 Poster: SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning »
Changan Chen · Carl Schissler · Sanchit Garg · Philip Kobernik · Alexander Clegg · Paul Calamia · Dhruv Batra · Philip Robinson · Kristen Grauman -
2022 Poster: Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency »
Viraj Prabhu · Sriram Yenamandra · Aaditya Singh · Judy Hoffman -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 Poster: SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation »
Abhinav Moudgil · Arjun Majumdar · Harsh Agrawal · Stefan Lee · Dhruv Batra -
2021 Poster: Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · John Turner · Noah Maestre · Mustafa Mukadam · Devendra Singh Chaplot · Oleksandr Maksymets · Aaron Gokaslan · Vladimír Vondruš · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2018 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Erik Wijmans · Samyak Datta · Ethan Perez · Mateusz Malinowski · Stefan Lee · Peter Anderson · Aaron Courville · Jeremie MARY · Dhruv Batra · Devi Parikh · Olivier Pietquin · Chiori HORI · Tim Marks · Anoop Cherian -
2017 : Morning panel discussion »
Jürgen Schmidhuber · Noah Goodman · Anca Dragan · Pushmeet Kohli · Dhruv Batra -
2017 : Invited Talk 2 »
Dhruv Batra -
2017 Poster: Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model »
Jiasen Lu · Anitha Kannan · Jianwei Yang · Devi Parikh · Dhruv Batra