Timezone: »
In this paper, we present a method to disentangle appearance and structural information in the latent space of StyleGAN. We train an autoencoder whose encoder extracts appearance and structural features from an input latent code and then reconstructs the original input using the decoder. To train this network, We propose a video-based latent contrastive learning framework. With the observation that the appearance of a face does not change within a short video, the encoder learns to pull appearance representations of various video frames together while pushing appearance representations of different faces apart. Similarly, the structural representations of augmented versions of the same frame are pulled together, while the representation across different frames are pushed apart. As face video datasets lack sufficient number of unique identities, we propose a method to synthetically generate videos. This allows our disentangling network to observe a larger variation of appearances, expressions, and poses during training. We evaluate our approach on the tasks of expression transfer in images and motion transfer in videos.
Author Information
Kevin Duarte (University of Central Florida)
Wei-An Lin (Adobe Systems)
Ratheesh Kalarot (Adobe Systems)
Jingwan (Cynthia) Lu (Adobe Research)
Jingwan has a passion for data-driven content creation. Her primary research focus is to apply deep generative models for photography applications. Her vision is to harness the power of machine learning in the age of data explosion to invent the next generation image and video editing tools. She also worked on brush models, stylization, guided texture synthesis, voice synthesis, etc. using various data-driven approaches.
Eli Shechtman (Adobe)
Shabnam Ghadar (Adobe Systems)
Mubarak Shah (University of Central Florida)
More from the Same Authors
-
2022 : Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks »
Saeed Vahidian · Mahdi Morafah · Chen Chen · Mubarak Shah · Bill Lin -
2022 Workshop: Vision Transformers: Theory and applications »
Fahad Shahbaz Khan · Gul Varol · Salman Khan · Ping Luo · Rao Anwer · Ashish Vaswani · Hisham Cholakkal · Niki Parmar · Joost van de Weijer · Mubarak Shah -
2022 Poster: Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation »
Ziwei Xu · Yogesh Rawat · Yongkang Wong · Mohan Kankanhalli · Mubarak Shah -
2021 Poster: Reformulating Zero-shot Action Recognition for Multi-label Actions »
Alec Kerrigan · Kevin Duarte · Yogesh Rawat · Mubarak Shah -
2020 Poster: Few-shot Image Generation with Elastic Weight Consolidation »
Yijun Li · Richard Zhang · Jingwan (Cynthia) Lu · Eli Shechtman -
2018 Poster: VideoCapsuleNet: A Simplified Network for Action Detection »
Kevin Duarte · Yogesh Rawat · Mubarak Shah