Timezone: »

Visually Grounded Interaction and Language
Florian Strub · Abhishek Das · Erik Wijmans · Harm de Vries · Stefan Lee · Alane Suhr · Dor Arad Hudson

Fri Dec 13 08:00 AM -- 06:15 PM (PST) @ West 202 - 204
Event URL: https://vigilworkshop.github.io »

The dominant paradigm in modern natural language understanding is learning statistical language models from text-only corpora. This approach is founded on a distributional notion of semantics, i.e. that the ''meaning'' of a word is based only on its relationship to other words. While effective for many applications, this approach suffers from limited semantic understanding -- symbols learned this way lack any concrete groundings into the multimodal, interactive environment in which communication takes place. The symbol grounding problem first highlighted this limitation, that ``meaningless symbols (i.e. words) cannot be grounded in anything but other meaningless symbols''.

On the other hand, humans acquire language by communicating about and interacting within a rich, perceptual environment -- providing concrete groundings, e.g. to objects or concepts either physical or psychological. Thus, recent works have aimed to bridge computer vision, interactive learning, and natural language understanding through language learning tasks based on natural images or through embodied agents performing interactive tasks in physically simulated environments, often drawing on the recent successes of deep learning and reinforcement learning. We believe these lines of research pose a promising approach for building models that do grasp the world's underlying complexity.

The goal of this third ViGIL workshop is to bring together scientists from various backgrounds - machine learning, computer vision, natural language processing, neuroscience, cognitive science, psychology, and philosophy - to share their perspectives on grounding, embodiment, and interaction. By providing this opportunity for cross-discipline discussion, we hope to foster new ideas about how to learn and leverage grounding in machines as well as build new bridges between the science of human cognition and machine learning.

Author Information

Florian Strub (DeepMind)
Abhishek Das (Georgia Tech)

CS PhD student at Georgia Tech. Learning to build machines that can see, think and talk. Interested in Deep Learning / Computer Vision.

Erik Wijmans (Georgia Institute of Technology)
Harm de Vries (Element AI)
Stefan Lee (Oregon State University)
Alane Suhr (Cornell)
Dor Arad Hudson (Stanford University)

More from the Same Authors