Timezone: »
Our goal is to predict future video frames given a sequence of input frames. Despite large amounts of video data, this remains a challenging task because of the high-dimensionality of video frames. We address this challenge by proposing the Decompositional Disentangled Predictive Auto-Encoder (DDPAE), a framework that combines structured probabilistic models and deep networks to automatically (i) decompose the high-dimensional video that we aim to predict into components, and (ii) disentangle each component to have low-dimensional temporal dynamics that are easier to predict. Crucially, with an appropriately specified generative model of video frames, our DDPAE is able to learn both the latent decomposition and disentanglement without explicit supervision. For the Moving MNIST dataset, we show that DDPAE is able to recover the underlying components (individual digits) and disentanglement (appearance and location) as we would intuitively do. We further demonstrate that DDPAE can be applied to the Bouncing Balls dataset involving complex interactions between multiple objects to predict the video frame directly from the pixels and recover physical states without explicit supervision.
Author Information
Jun-Ting Hsieh (Stanford University)
Bingbin Liu (Stanford University)
De-An Huang (Stanford University)
Li Fei-Fei (Stanford University & Google)
Juan Carlos Niebles (Stanford University)
More from the Same Authors
-
2021 : Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning »
Kaylee Burns · Christopher D Manning · Li Fei-Fei -
2021 : What Matters in Learning from Offline Human Demonstrations for Robot Manipulation »
Ajay Mandlekar · Danfei Xu · Josiah Wong · Chen Wang · Li Fei-Fei · Silvio Savarese · Yuke Zhu · Roberto Martín-Martín -
2022 Poster: MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing »
Zelun Luo · Zane Durante · Linden Li · Wanze Xie · Ruochen Liu · Emily Jin · Zhuoyi Huang · Lun Yu Li · Jiajun Wu · Juan Carlos Niebles · Ehsan Adeli · Fei-Fei Li -
2021 Poster: MOMA: Multi-Object Multi-Actor Activity Parsing »
Zelun Luo · Wanze Xie · Siddharth Kapoor · Yiyun Liang · Michael Cooper · Juan Carlos Niebles · Ehsan Adeli · Fei-Fei Li -
2020 : Closing remarks from Fei-Fei Li, Sequoia Professor of Computer Science, Stanford University & Co-Director of Stanford’s Human-Centered AI Institute »
Li Fei-Fei -
2020 : Q/A for invited talk #5 »
Li Fei-Fei -
2020 : Creating diverse tasks to catalyze robot learning »
Li Fei-Fei -
2019 Poster: Regression Planning Networks »
Danfei Xu · Roberto Martín-Martín · De-An Huang · Yuke Zhu · Silvio Savarese · Li Fei-Fei -
2019 Poster: HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models »
Sharon Zhou · Mitchell Gordon · Ranjay Krishna · Austin Narcomey · Li Fei-Fei · Michael Bernstein -
2019 Oral: HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models »
Sharon Zhou · Mitchell Gordon · Ranjay Krishna · Austin Narcomey · Li Fei-Fei · Michael Bernstein -
2018 Workshop: NIPS Workshop on Machine Learning for Intelligent Transportation Systems 2018 »
Li Erran Li · Anca Dragan · Juan Carlos Niebles · Silvio Savarese -
2018 Poster: Learning to Play With Intrinsically-Motivated, Self-Aware Agents »
Nick Haber · Damian Mrowca · Stephanie Wang · Li Fei-Fei · Daniel Yamins -
2018 Poster: Flexible neural representation for physics prediction »
Damian Mrowca · Chengxu Zhuang · Elias Wang · Nick Haber · Li Fei-Fei · Josh Tenenbaum · Daniel Yamins -
2017 Workshop: 2017 NIPS Workshop on Machine Learning for Intelligent Transportation Systems »
Li Erran Li · Anca Dragan · Juan Carlos Niebles · Silvio Savarese -
2017 : Keynote II: Fei-Fei Li, Stanford »
Li Fei-Fei -
2017 Poster: Label Efficient Learning of Transferable Representations acrosss Domains and Tasks »
Zelun Luo · Yuliang Zou · Judy Hoffman · Li Fei-Fei -
2016 : Knowledge Acquisition for Visual Question Answering via Iterative Querying »
Yuke Zhu · Joseph Lim · Li Fei-Fei -
2016 : Invited Talk: Visual Understanding of Human Activities for Smart Vehicles and Interactive Environments (Juan Carlos Niebles, Stanford) »
Juan Carlos Niebles -
2014 Poster: Deep Fragment Embeddings for Bidirectional Image Sentence Mapping »
Andrej Karpathy · Armand Joulin · Li Fei-Fei -
2012 Workshop: Big Data Meets Computer Vision: First International Workshop on Large Scale Visual Recognition and Retrieval »
Jia Deng · Samy Bengio · Yuanqing Lin · Li Fei-Fei -
2012 Poster: Shifting Weights: Adapting Object Detectors from Image to Video »
Kevin Tang · Vignesh Ramanathan · Li Fei-Fei · Daphne Koller -
2012 Demonstration: EVA: Engine for Visual Annotation »
Jia Deng · Joanathan Krause · Zhiheng Huang · Alexander C Berg · Li Fei-Fei -
2011 Poster: Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition »
Jia Deng · Sanjeev Satheesh · Alexander C Berg · Li Fei-Fei -
2011 Poster: Large-Scale Category Structure Aware Image Categorization »
Bin Zhao · Li Fei-Fei · Eric Xing -
2010 Session: Oral Session 10 »
Li Fei-Fei -
2010 Poster: Large Margin Learning of Upstream Scene Understanding Models »
Jun Zhu · Li-Jia Li · Li Fei-Fei · Eric Xing -
2010 Poster: Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification »
Li-Jia Li · Hao Su · Eric Xing · Li Fei-Fei