Timezone: »
Demonstrations and natural language instructions are two common ways to specify and teach robots novel tasks. However, for many complex tasks, a demonstration or language instruction alone contains ambiguities, preventing tasks from being specified clearly. In such cases, a combination of both a demonstration and an instruction more concisely and effectively conveys the task to the robot than either modality alone. To instantiate this problem setting, we train a single multi-task policy on a few hundred challenging robotic pick-and-place tasks and propose DeL-TaCo (Joint Demo-Language Task Conditioning), a method for conditioning a robotic policy on task embeddings comprised of two components: a visual demonstration and a language instruction. By allowing these two modalities to mutually disambiguate and clarify each other during novel task specification, DeL-TaCo (1) substantially decreases the teacher effort needed to specify a new task and (2) achieves better generalization performance on novel objects and instructions over previous task-conditioning methods. To our knowledge, this is the first work to show that simultaneously conditioning a multi-task robotic manipulation policy on both demonstration and language embeddings improves sample efficiency and generalization over conditioning on either modality alone.
Author Information
Albert Yu (University of Texas at Austin)
Raymond Mooney (University of Texas at Austin)
More from the Same Authors
-
2022 : Zero-shot Video Moment Retrieval With Off-the-Shelf Models »
Anuj Diwan · Puyuan Peng · Raymond Mooney -
2022 : Language-guided Task Adaptation for Imitation Learning »
Prasoon Goyal · Raymond Mooney · Scott Niekum -
2019 Poster: Self-Critical Reasoning for Robust Visual Question Answering »
Jialin Wu · Raymond Mooney -
2019 Spotlight: Self-Critical Reasoning for Robust Visual Question Answering »
Jialin Wu · Raymond Mooney -
2018 : Learning to Understand Natural Language Instructions through Human-Robot Dialog »
Raymond Mooney -
2017 : Panel Discussion »
Felix Hill · Olivier Pietquin · Jack Gallant · Raymond Mooney · Sanja Fidler · Chen Yu · Devi Parikh -
2017 : Visually Grounded Language: Past, Present, and Future... »
Raymond Mooney -
2015 : Generating Natural-Language Video Descriptions using LSTM Recurrent Neural Networks »
Raymond Mooney -
2011 Workshop: Integrating Language and Vision »
Raymond Mooney · Trevor Darrell · Kate Saenko