Timezone: »

 
Poster
How Transferable are Video Representations Based on Synthetic Data?
Yo-whan Kim · Samarth Mishra · SouYoung Jin · Rameswar Panda · Hilde Kuehne · Leonid Karlinsky · Venkatesh Saligrama · Kate Saenko · Aude Oliva · Rogerio Feris

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #1033

Action recognition has improved dramatically with massive-scale video datasets. Yet, these datasets are accompanied with issues related to curation cost, privacy, ethics, bias, and copyright. Compared to that, only minor efforts have been devoted toward exploring the potential of synthetic video data. In this work, as a stepping stone towards addressing these shortcomings, we study the transferability of video representations learned solely from synthetically-generated video clips, instead of real data. We propose SynAPT, a novel benchmark for action recognition based on a combination of existing synthetic datasets, in which a model is pre-trained on synthetic videos rendered by various graphics simulators, and then transferred to a set of downstream action recognition datasets, containing different categories than the synthetic data. We provide an extensive baseline analysis on SynAPT revealing that the simulation-to-real gap is minor for datasets with low object and scene bias, where models pre-trained with synthetic data even outperform their real data counterparts. We posit that the gap between real and synthetic action representations can be attributed to contextual bias and static objects related to the action, instead of the temporal dynamics of the action itself. The SynAPT benchmark is available at https://github.com/mintjohnkim/SynAPT.

Author Information

Yo-whan Kim (Massachusetts Institute of Technology)
Samarth Mishra (Boston University)
SouYoung Jin (Dartmouth College)
Rameswar Panda (MIT-IBM Watson AI Lab)
Hilde Kuehne (Goethe University Frankfurt, MIT-IBM Waston AI Lab)
Hilde Kuehne

Prof. Dr. Hilde Kuehne is Head of Computer Vision and Machine Learning at the Computational Vision & Artificial Intelligence Group at the Goethe University Frankfurt and an affiliated professor at the MIT-IBM Watson AI Lab. Her research focuses on weakly and unsupervised recognition and understanding of video data. She obtained her doctoral degree in engineering from the Karlsruhe Institute of Technology (KIT) in 2014. Her experience includes projects with various European and US universities and international technology companies with a focus on image and video understanding processing. She has published various high-impact publications in the field, including the HMDB action classification dataset. She has organized various workshops in the field and served as area chair for CVPR, ICCV, and WACV. Beyond her work, she is committed to bringing more diversity to STEM.

Leonid Karlinsky (Weizmann Institute of Science)
Venkatesh Saligrama (Boston University)
Kate Saenko (Boston University & MIT-IBM Watson AI Lab, IBM Research)
Aude Oliva (Massachusetts Institute of Technology)
Rogerio Feris (MIT-IBM Watson AI Lab, IBM Research)

More from the Same Authors