We present a new model DRNET that learns disentangled image representations from video. Our approach leverages the temporal coherence of video and a novel adversarial loss to learn a representation that factorizes each frame into a stationary part and a temporally varying component. The disentangled representation can be used for a range of tasks. For example, applying a standard LSTM to the time-vary components enables prediction of future frames. We evaluating our approach on a range of synthetic and real videos. For the latter, we demonstrate the ability to coherently generate up to several hundred steps into the future.
Emily Denton (New York University)
Emily Denton is a Research Scientist at Google where they examine the societal impacts of AI technology. Their recent research centers on critically examining the norms, values, and work practices that structure the development and use of machine learning datasets. Prior to joining Google, Emily received their PhD in machine learning from the Courant Institute of Mathematical Sciences at New York University, where they focused on unsupervised learning and generative modeling of images and video.
vighnesh Birodkar (New York University)
Related Events (a corresponding poster, oral, or spotlight)
2017 Spotlight: Unsupervised Learning of Disentangled Representations from Video »
Thu Dec 7th 01:55 -- 02:00 AM Room Hall C
More from the Same Authors
2021 Tutorial: Beyond Fairness in Machine Learning »
Timnit Gebru · Emily Denton
2017 Workshop: Learning Disentangled Features: from Perception to Control »
Emily Denton · Siddharth Narayanaswamy · Tejas Kulkarni · Honglak Lee · Diane Bouchacourt · Josh Tenenbaum · David Pfau
2015 Poster: Deep Generative Image Models using a ￼Laplacian Pyramid of Adversarial Networks »
Emily Denton · Soumith Chintala · arthur szlam · Rob Fergus
2014 Poster: Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation »
Emily Denton · Wojciech Zaremba · Joan Bruna · Yann LeCun · Rob Fergus