Timezone: »

Object Representations Guided By Optical Flow
Jianing Qian · Dinesh Jayaraman

Objects are powerful abstractions for representing the complexity of the world, and many computer vision tasks focus on learning to understand objects and their properties in images from annotated examples. Spurred by advances in unsupervised visual representation learning, there is growing interest in learning object-centric image representations \emph{without} manual object annotations, through reconstruction and contrastive losses. We observe that these existing approaches fail to effectively exploit a long-known key signal for grouping object pixels, namely, motion in time. To address this, we propose to guide object representations during training to be consistent with optical flow correspondences between consecutive images in video sequences of moving objects. At test time, our approach generates object representations of individual images without requiring any correspondences. Through experiments across three datasets including a real-world robotic manipulation dataset, we demonstrate that our method consistently outperforms prior approaches including those that have access to additional information.

Author Information

Jianing Qian (University of Pennsylvania)
Dinesh Jayaraman (University of Pennsylvania)

I am an assistant professor at UPenn’s GRASP lab. I lead the Perception, Action, and Learning (PAL) Research Group, where we work on problems at the intersection of computer vision, machine learning, and robotics.

More from the Same Authors