Visually Grounded Interaction and Language
Florian Strub · Abhishek Das · Erik Wijmans · Harm de Vries · Stefan Lee · Alane Suhr · Dor Arad Hudson

Fri Dec 13th 08:00 AM -- 06:15 PM @ West 202 - 204
Event URL: »

The dominant paradigm in modern natural language understanding is learning statistical language models from text-only corpora. This approach is founded on a distributional notion of semantics, i.e. that the ''meaning'' of a word is based only on its relationship to other words. While effective for many applications, this approach suffers from limited semantic understanding -- symbols learned this way lack any concrete groundings into the multimodal, interactive environment in which communication takes place. The symbol grounding problem first highlighted this limitation, that ``meaningless symbols (i.e. words) cannot be grounded in anything but other meaningless symbols''.

On the other hand, humans acquire language by communicating about and interacting within a rich, perceptual environment -- providing concrete groundings, e.g. to objects or concepts either physical or psychological. Thus, recent works have aimed to bridge computer vision, interactive learning, and natural language understanding through language learning tasks based on natural images or through embodied agents performing interactive tasks in physically simulated environments, often drawing on the recent successes of deep learning and reinforcement learning. We believe these lines of research pose a promising approach for building models that do grasp the world's underlying complexity.

The goal of this third ViGIL workshop is to bring together scientists from various backgrounds - machine learning, computer vision, natural language processing, neuroscience, cognitive science, psychology, and philosophy - to share their perspectives on grounding, embodiment, and interaction. By providing this opportunity for cross-discipline discussion, we hope to foster new ideas about how to learn and leverage grounding in machines as well as build new bridges between the science of human cognition and machine learning.

08:20 AM Opening Remarks <span> <a href="#"></a> </span> Florian Strub, Harm de Vries, Abhishek Das, Stefan Lee, Erik Wijmans, Drew Arad Hudson, Alane Suhr
08:30 AM Grasping Language (Talk) Jason Baldridge
09:10 AM From Human Language to Agent Action (Talk) Jesse Thomason
09:50 AM Coffee Break <span> <a href="#"></a> </span>
10:30 AM Spotlight <span> <a href="#"></a> </span>
10:50 AM Why language understanding is not a solved problem (Talk) Jay McClelland
11:30 AM Louis-Philippe Morency (Talk) LP Morency
12:10 PM Poster session (Poster Session)
Candace Ross, Yassine Mrabet, Sanjay Subramanian, Geoffrey Cideron, Jesse Mu, Suvrat Bhooshan, Eda Okur Kavil, Jean-Benoit Delbrouck, Yen-Ling Kuo, Nicolas Lair, Gabriel Ilharco, T.S. Jayram, Alba MarĂ­a Herrera Palacio, Chihiro Fujiyama, Olivier Tieleman, Anna Potapenko, Guan-Lin Chao, Thomas M. Sutter, Olga Kovaleva, Farley Lai, Xin Wang, Vasu Sharma, Catalina Cangea, Nikhil Krishnaswamy, Yuta Tsuboi, Alexander Kuhnle, Khanh Nguyen, Dian Yu, Homagni Saha, Jiannan Xiang, Vijay Venkataraman, Ankita Kalra, Ning Xie, Derek Doran, Travis Goodwin, Asim Kadav, Shabnam Daghaghi, Jason Baldridge, Jialin Wu, Jingxiang Lin, Unnat Jain
01:50 PM Lisa Anne Hendricks (Talk) Lisa Anne Hendricks
02:30 PM Linda Smith (Talk) Linda Smith
03:10 PM Poster Session (Coffee Break)
04:00 PM Timothy Lillicrap (Talk) Timothy Lillicrap
04:40 PM Josh Tenenbaum (Talk) Josh Tenenbaum
05:20 PM Panel Discussion <span> <a href="#"></a> </span> Linda Smith, Josh Tenenbaum, Lisa Anne Hendricks, Jay McClelland, Timothy Lillicrap, Jesse Thomason, Jason Baldridge, LP Morency
06:00 PM Closing Remarks <span> <a href="#"></a> </span>

Author Information

Florian Strub (DeepMind)
Abhishek Das (Georgia Tech)

CS PhD student at Georgia Tech. Learning to build machines that can see, think and talk. Interested in Deep Learning / Computer Vision.

Erik Wijmans (Georgia Institute of Technology)
Harm de Vries (Element AI)
Stefan Lee (Oregon State University)
Alane Suhr (Cornell)
Drew Arad Hudson (Stanford University)

More from the Same Authors