Grasping Language
Jason Baldridge

Fri Dec 13 08:30 AM -- 09:10 AM (PST) @

There is a usability gap between manipulation-capable robots and helpful in-home digital agents. Dialog-enabled smart assistants have recently seen widespread adoption, but these cannot move or manipulate objects. By contrast, manipulation-capable and mobile robots are still largely deployed in industrial settings and do not interact with human users. Language-enabled robots can bridge this gap---natural language interfaces help robots and non-experts collaborate to achieve their goals. Navigation in unexplored environments to high-level targets like "Go to the room with a plant" can be facilitated by enabling agents to ask questions and react to human clarifications on-the-fly. Further, high-level instructions like "Put a plate of toast on the table" require inferring many steps, from finding a knife to operating a toaster. Low-level instructions can serve to clarify these individual steps. Through two new datasets and accompanying models, we study human-human dialog for cooperative navigation, and high- and low-level language instructions for cooking, cleaning, and tidying in interactive home environments. These datasets are a first step towards collaborative, dialog-enabled robots helpful in human spaces.

Author Information

Jason Baldridge (Google)

Jason Baldridge is a research scientist at Google working on grounded language understanding, with a focus on vision-and-language navigation, spatiotemporal representations, and connecting vision and language. Prior to Google, Jason was a professor at the University of Texas at Austin, where he worked on categorial grammar, parsing, coreference, sentiment analysis, discourse structure, geolocation, and NLP for low resource languages.

