This is the public, feature-limited version of the conference webpage. After Registration and login please visit the full version.

Counterfactual Vision-and-Language Navigation: Unravelling the Unseen

Amin Parvaneh, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Anton van den Hengel

Spotlight presentation: Orals & Spotlights Track 22: Vision Applications
on 2020-12-09T20:10:00-08:00 - 2020-12-09T20:20:00-08:00
Poster Session 5 (more posters)
on 2020-12-09T21:00:00-08:00 - 2020-12-09T23:00:00-08:00
Abstract: The task of vision-and-language navigation (VLN) requires an agent to follow text instructions to find its way through simulated household environments. A prominent challenge is to train an agent capable of generalising to new environments at test time, rather than one that simply memorises trajectories and visual details observed during training. We propose a new learning strategy that learns both from observations and generated counterfactual environments. We describe an effective algorithm to generate counterfactual observations on the fly for VLN, as linear combinations of existing environments. Simultaneously, we encourage the agent's actions to remain stable between original and counterfactual environments through our novel training objective-effectively removing the spurious features that otherwise bias the agent. Our experiments show that this technique provides significant improvements in generalisation on benchmarks for Room-to-Room navigation and Embodied Question Answering.

Pre-recorded Spotlight Presentation

To ask questions please use rocketchat, available only upon registration and login.

Preview Video and Chat

To see video, interact with the author and ask questions please use registration and login.