Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Deep Generative Models and Downstream Applications

Stochastic Video Prediction with Perceptual Loss

Donghun Lee · Ingook Jang · Seonghyun Kim · Chanwon Park · JUN HEE PARK


Abstract:

Predicting future states is a challenging process in the decision-making system because of its inherently uncertain nature. Most works in this literature are based on deep generative networks such as variational autoencoder which uses pixel-wise reconstruction in their loss functions. Predicting the future with pixel-wise reconstruction could fail to capture the full distribution of high-level representations and result in inaccurate and blurred predictions. In this paper, we propose stochastic video generation with perceptual loss (SVG-PL) to improve uncertainty and blurred area in future prediction. The proposed model combines perceptual loss function and pixel-wise loss function for image reconstruction and future state predictions. The model is built on a variational autoencoder to reduce high dimensionality to latent variable to capture both spatial information and temporal dynamics of future prediction. We show that utilization of perceptual loss on video prediction improves reconstruction ability and result in clear predictions. Improvements in video prediction could further help the decision-making process in multiple downstream applications.

Chat is not available.