Timezone: »

Learning to perceive objects by prediction
Tushar Arora · Li Erran Li · Mingbo Cai

Mon Dec 13 07:10 AM -- 07:20 AM (PST) @
Event URL: https://openreview.net/forum?id=lIQ45G9P_zh »

The representation of objects is the building block of higher-level concepts. Infants develop the notion of objects without supervision. The prediction error of future sensory input is likely the major teaching signal for infants. Inspired by this, we propose a new framework to extract object-centric representation from single 2D images by learning to predict future scenes in the presence of moving objects. We treat objects as latent causes of which the function for an agent is to facilitate efficient prediction of the coherent motion of their parts in visual input. Distinct from previous object-centric models, our model learns to explicitly infer objects' locations in a 3D environment in addition to segmenting objects. Further, the network learns a latent code space where objects with the same geometric shape and texture/color frequently group together. The model requires no supervision or pre-training of any part of the network. We created a new synthetic dataset with more complex textures on objects and background and found several previous models not based on predictive learning overly rely on clustering colors and lose specificity in object segmentation. Our work demonstrates a new approach for learning symbolic representation grounded in sensation and action.

Author Information

Tushar Arora (The University of Tokyo)
Li Erran Li (AWS AI, Amazon)

Li Erran Li is the head of machine learning at Scale and an adjunct professor at Columbia University. Previously, he was chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and machine learning platform team at Uber where he worked on deep learning for autonomous driving, led the machine learning platform team technically, and drove strategy for company-wide artificial intelligence initiatives. He started his career at Bell Labs. Li’s current research interests are machine learning, computer vision, learning-based robotics, and their application to autonomous driving. He has a PhD from the computer science department at Cornell University. He’s an ACM Fellow and IEEE Fellow.

Mingbo Cai (University of Tokyo)

More from the Same Authors