Timezone: »

Learning Disentangled Representations of Videos with Missing Data
Armand Comas · Chi Zhang · Zlatan Feric · Octavia Camps · Rose Yu

Tue Dec 08 09:00 PM -- 11:00 PM (PST) @ Poster Session 2 #620

Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object, while it imputes each object trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons on a real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting. Our code can be found in https://github.com/Rose-STL-Lab/DIVE.

Author Information

Armand Comas (Northeastern University)

Passionate about video neurosymbolic representation learning, object-oriented learning, relational inference, abstract reasoning, causality and dynamics. But I'll enjoy discussing any topic!

Chi Zhang (Northeastern University)
Zlatan Feric (Northeastern University)
Octavia Camps (Northeastern University)
Rose Yu (University of California, San Diego)

More from the Same Authors