Timezone: »

Visually grounded interaction and language
Florian Strub · Harm de Vries · Abhishek Das · Satwik Kottur · Stefan Lee · Mateusz Malinowski · Olivier Pietquin · Devi Parikh · Dhruv Batra · Aaron Courville · Jeremie Mary

Fri Dec 08 08:00 AM -- 06:30 PM (PST) @ 101 B
Event URL: https://nips2017vigil.github.io/ »

Everyday interactions require a common understanding of language, i.e. for people to communicate effectively, words (for example ‘cat’) should invoke similar beliefs over physical concepts (what cats look like, the sounds they make, how they behave, what their skin feels like etc.). However, how this ‘common understanding’ emerges is still unclear.

One appealing hypothesis is that language is tied to how we interact with the environment. As a result, meaning emerges by ‘grounding’ language in modalities in our environment (images, sounds, actions, etc.).

Recent concurrent works in machine learning have focused on bridging visual and natural language understanding through visually-grounded language learning tasks, e.g. through natural images (Visual Question Answering, Visual Dialog), or through interactions with virtual physical environments. In cognitive science, progress in fMRI enables creating a semantic atlas of the cerebral cortex, or to decode semantic information from visual input. And in psychology, recent studies show that a baby’s most likely first words are based on their visual experience, laying the foundation for a new theory of infant language acquisition and learning.

As the grounding problem requires an interdisciplinary attitude, this workshop aims to gather researchers with broad expertise in various fields — machine learning, computer vision, natural language, neuroscience, and psychology — to discuss their cutting edge work as well as perspectives on future directions in this exciting space of grounding and interactions.

We will accept papers related to:
— language acquisition or learning through interactions
— visual captioning, dialog, and question-answering
— reasoning in language and vision
— visual synthesis from language
— transfer learning in language and vision tasks
— navigation in virtual worlds with natural-language instructions
— machine translation with visual cues
— novel tasks that combine language, vision and actions
— understanding and modeling the relationship between language and vision in humans
— semantic systems and modeling of natural language and visual stimuli representations in the human brain

Important dates
Submission deadline: 3rd November 2017
Extended Submission deadline: 17th November 2017

Acceptance notification (First deadline): 10th November 2017
Acceptance notification (Second deadline): 24th November 2017

Workshop: 8th December 2017

Paper details
— Contributed papers may include novel research, preliminary results, extended abstract, positional papers or surveys
— Papers are limited to 4 pages, excluding references, in the latest camera-ready NIPS format: https://nips.cc/Conferences/2017/PaperInformation/StyleFiles
— Papers published at the main conference can be submitted without reformatting
— Please submit via email: nips2017vigil@gmail.com

Accepted papers
— All accepted papers will be presented during 2 poster sessions
— Up to 5 accepted papers will be invited to deliver short talks
— Accepted papers will be made publicly available as non-archival reports, allowing future submissions to archival conferences and journals

Invited Speakers
Raymond J. Mooney - University of Texas
Sanja Fidler - University of Toronto
Olivier Pietquin - DeepMind
Jack Gallant - University of Berkeley
Devi Parikh - Georgia Tech / FAIR
Felix Hill - DeepMind
Jack Gallant - Univeristy of Berkeley
Chen Yu - University of Indiana

Author Information

Florian Strub (Univ Lille1, CRIStAL, Inria - SequeL Team)
Harm de Vries (Université de Montréal)
Abhishek Das (Georgia Tech)

CS PhD student at Georgia Tech. Learning to build machines that can see, think and talk. Interested in Deep Learning / Computer Vision.

Satwik Kottur (Carnegie Mellon University)
Stefan Lee (Georgia Tech)
Mateusz Malinowski (DeepMind)

Mateusz Malinowski is a research scientist at DeepMind, where he works at the intersection of computer vision, natural language understanding, and deep learning. He was granted PhD (Dr.-Ing.) with the highest honor (summa cum laude) at Max Planck Institute for Informatics in 2017 in computer vision for his pioneering work on visual question answering, where he proposed the task and developed methods that answer questions about the content of images. Prior to this, he graduated with honors from Saarland University in computer science. Before that, he studied computer science at Wroclaw University in Poland.

Olivier Pietquin (Google DeepMind)
Devi Parikh (Georgia Tech / Facebook AI Research (FAIR))
Dhruv Batra (Georgia Tech / Facebook AI Research (FAIR))
Aaron Courville (U. Montreal)
Jeremie Mary (INRIA / Univ. Lille)

More from the Same Authors