Timezone: »
The objective of this paper is a model that is able to discover, track and segment multiple moving objects in a video. We make four contributions: First, we introduce an object-centric segmentation model with a depth-ordered layer representation. This is implemented using a variant of the transformer architecture that ingests optical flow, where each query vector specifies an object and its layer for the entire video. The model can effectively discover multiple moving objects and handle mutual occlusions; Second, we introduce a scalable pipeline for generating multi-object synthetic training data via layer compositions, that is used to train the proposed model, significantly reducing the requirements for labour-intensive annotations, and supporting Sim2Real generalisation; Third, we conduct thorough ablation studies, showing that the model is able to learn object permanence and temporal shape consistency, and is able to predict amodal segmentation masks; Fourth, we evaluate our model, trained only on synthetic data, on standard video segmentation benchmarks, DAVIS, MoCA, SegTrack, FBMS-59, and achieve state-of-the-art performance among existing methods that do not rely on any manual annotations. With test-time adaptation, we observe further performance boosts.
Author Information
Junyu Xie (University of Oxford)
Weidi Xie (University of Oxford)
Andrew Zisserman (DeepMind & University of Oxford)
More from the Same Authors
-
2021 : PASS: An ImageNet replacement for self-supervised pretraining without humans »
Yuki Asano · Christian Rupprecht · Andrew Zisserman · Andrea Vedaldi -
2021 : PASS: An ImageNet replacement for self-supervised pretraining without humans »
Yuki Asano · Christian Rupprecht · Andrew Zisserman · Andrea Vedaldi -
2021 : 3D Spinal Column Segmentation with Single Plane 2D-Projected Annotations »
Rhydian Windsor · Amir Jamaludin · Timor Kadir · Andrew Zisserman -
2022 Spotlight: Lightning Talks 6A-3 »
Junyu Xie · Chengliang Zhong · Ali Ayub · Sravanti Addepalli · Harsh Rangwani · Jiapeng Tang · Yuchen Rao · Zhiying Jiang · Yuqi Wang · Xingzhe He · Gene Chou · Ilya Chugunov · Samyak Jain · Yuntao Chen · Weidi Xie · Sumukh K Aithal · Carter Fendley · Lev Markhasin · Yiqin Dai · Peixing You · Bastian Wandt · Yinyu Nie · Helge Rhodin · Felix Heide · Ji Xin · Angela Dai · Andrew Zisserman · Bi Wang · Xiaoxue Chen · Mayank Mishra · ZHAO-XIANG ZHANG · Venkatesh Babu R · Justus Thies · Ming Li · Hao Zhao · Venkatesh Babu R · Jimmy Lin · Fuchun Sun · Matthias Niessner · Guyue Zhou · Xiaodong Mu · Chuang Gan · Wenbing Huang -
2022 Spotlight: Segmenting Moving Objects via an Object-Centric Layered Representation »
Junyu Xie · Weidi Xie · Andrew Zisserman -
2022 Poster: ReCo: Retrieve and Co-segment for Zero-shot Transfer »
Gyungin Shin · Weidi Xie · Samuel Albanie -
2022 Poster: Associating Objects and Their Effects in Video through Coordination Games »
Erika Lu · Forrester Cole · Weidi Xie · Tali Dekel · Bill Freeman · Andrew Zisserman · Michael Rubinstein -
2022 Poster: Flamingo: a Visual Language Model for Few-Shot Learning »
Jean-Baptiste Alayrac · Jeff Donahue · Pauline Luc · Antoine Miech · Iain Barr · Yana Hasson · Karel Lenc · Arthur Mensch · Katherine Millican · Malcolm Reynolds · Roman Ring · Eliza Rutherford · Serkan Cabi · Tengda Han · Zhitao Gong · Sina Samangooei · Marianne Monteiro · Jacob L Menick · Sebastian Borgeaud · Andy Brock · Aida Nematzadeh · Sahand Sharifzadeh · Mikołaj Bińkowski · Ricardo Barreira · Oriol Vinyals · Andrew Zisserman · Karén Simonyan -
2022 Poster: TAP-Vid: A Benchmark for Tracking Any Point in a Video »
Carl Doersch · Ankush Gupta · Larisa Markeeva · Adria Recasens · Lucas Smaira · Yusuf Aytar · Joao Carreira · Andrew Zisserman · Yi Yang -
2020 Poster: CrossTransformers: spatially-aware few-shot transfer »
Carl Doersch · Ankush Gupta · Andrew Zisserman -
2020 Poster: Self-supervised Co-Training for Video Representation Learning »
Tengda Han · Weidi Xie · Andrew Zisserman -
2020 Poster: Self-Supervised MultiModal Versatile Networks »
Jean-Baptiste Alayrac · Adria Recasens · Rosalia Schneider · Relja Arandjelović · Jason Ramapuram · Jeffrey De Fauw · Lucas Smaira · Sander Dieleman · Andrew Zisserman -
2019 Poster: Unsupervised Learning of Object Keypoints for Perception and Control »
Tejas Kulkarni · Ankush Gupta · Catalin Ionescu · Sebastian Borgeaud · Malcolm Reynolds · Andrew Zisserman · Volodymyr Mnih -
2019 Poster: Sim2real transfer learning for 3D human pose estimation: motion to the rescue »
Carl Doersch · Andrew Zisserman -
2018 Poster: Learning to Navigate in Cities Without a Map »
Piotr Mirowski · Matt Grimes · Mateusz Malinowski · Karl Moritz Hermann · Keith Anderson · Denis Teplyashin · Karen Simonyan · koray kavukcuoglu · Andrew Zisserman · Raia Hadsell -
2015 Poster: Spatial Transformer Networks »
Max Jaderberg · Karen Simonyan · Andrew Zisserman · koray kavukcuoglu -
2015 Spotlight: Spatial Transformer Networks »
Max Jaderberg · Karen Simonyan · Andrew Zisserman · koray kavukcuoglu -
2014 Poster: Two-Stream Convolutional Networks for Action Recognition in Videos »
Karen Simonyan · Andrew Zisserman -
2014 Spotlight: Two-Stream Convolutional Networks for Action Recognition in Videos »
Karen Simonyan · Andrew Zisserman -
2013 Poster: Deep Fisher Networks for Large-Scale Image Classification »
Karen Simonyan · Andrea Vedaldi · Andrew Zisserman -
2013 Spotlight: Deep Fisher Networks for Large-Scale Image Classification »
Karen Simonyan · Andrea Vedaldi · Andrew Zisserman -
2011 Poster: Pylon Model for Semantic Segmentation »
Victor Lempitsky · Andrea Vedaldi · Andrew Zisserman -
2010 Poster: Simultaneous Object Detection and Ranking with Weak Supervision »
Matthew B Blaschko · Andrea Vedaldi · Andrew Zisserman -
2010 Spotlight: Learning To Count Objects in Images »
Victor Lempitsky · Andrew Zisserman -
2010 Poster: Learning To Count Objects in Images »
Victor Lempitsky · Andrew Zisserman -
2009 Poster: Segmenting Scenes by Matching Image Composites »
Bryan C Russell · Alexei A Efros · Josef Sivic · Bill Freeman · Andrew Zisserman -
2009 Poster: Structured output regression for detection with partial truncation »
Andrea Vedaldi · Andrew Zisserman -
2008 Poster: SDL: Supervised Dictionary Learning »
Julien Mairal · Francis Bach · Jean A Ponce · Guillermo Sapiro · Andrew Zisserman -
2007 Spotlight: Learning Visual Attributes »
Vittorio Ferrari · Andrew Zisserman -
2007 Poster: Learning Visual Attributes »
Vittorio Ferrari · Andrew Zisserman -
2006 Poster: Bayesian Image Super-resolution, Continued »
Lyndsey C Pickup · David Capel · Stephen J Roberts · Andrew Zisserman -
2006 Spotlight: Bayesian Image Super-resolution, Continued »
Lyndsey C Pickup · David Capel · Stephen J Roberts · Andrew Zisserman