NeurIPS 2020 : Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views



Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Nanbo Li, Cian Eastwood, Robert Fisher

Spotlight presentation: Orals & Spotlights Track 15: COVID/Applications/Composition
on Wed, Dec 9th, 2020 @ 16:10 – 16:20 GMT

Poster Session 4 (more posters)
on Wed, Dec 9th, 2020 @ 17:00 – 19:00 GMT

Toggle Abstract Paper (in Proceedings / .pdf)

Abstract: Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for \textit{unsupervised object-centric scene representation} are incapable of aggregating information from multiple observations of a scene. As a result, these ``single-view'' methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose \textit{The Multi-View and Multi-Object Network (MulMON)}---a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views. In order to sidestep the main technical difficulty of the \textit{multi-object-multi-view} scenario---maintaining object correspondences across views---MulMON iteratively updates the latent object representations for a scene over multiple views. To ensure that these iterative updates do indeed aggregate spatial information to form a complete 3D scene understanding, MulMON is asked to predict the appearance of the scene from novel viewpoints during training. Through experiments we show that MulMON better-resolves spatial ambiguities than single-view methods---learning more accurate and disentangled object representations---and also achieves new functionality in predicting object segmentations for novel viewpoints.

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Nanbo Li, Cian Eastwood, Robert Fisher

Preview Video and Chat

To see video, interact with the author and ask questions please use registration and login.