Timezone: »

How Transferable are Video Representations Based on Synthetic Data?
Yo-whan Kim · Samarth Mishra · SouYoung Jin · Rameswar Panda · Hilde Kuehne · Leonid Karlinsky · Venkatesh Saligrama · Kate Saenko · Aude Oliva · Rogerio Feris

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #1033

Action recognition has improved dramatically with massive-scale video datasets. Yet, these datasets are accompanied with issues related to curation cost, privacy, ethics, bias, and copyright. Compared to that, only minor efforts have been devoted toward exploring the potential of synthetic video data. In this work, as a stepping stone towards addressing these shortcomings, we study the transferability of video representations learned solely from synthetically-generated video clips, instead of real data. We propose SynAPT, a novel benchmark for action recognition based on a combination of existing synthetic datasets, in which a model is pre-trained on synthetic videos rendered by various graphics simulators, and then transferred to a set of downstream action recognition datasets, containing different categories than the synthetic data. We provide an extensive baseline analysis on SynAPT revealing that the simulation-to-real gap is minor for datasets with low object and scene bias, where models pre-trained with synthetic data even outperform their real data counterparts. We posit that the gap between real and synthetic action representations can be attributed to contextual bias and static objects related to the action, instead of the temporal dynamics of the action itself. The SynAPT benchmark is available at https://github.com/mintjohnkim/SynAPT.

Author Information

Yo-whan Kim (Massachusetts Institute of Technology)
Samarth Mishra (Boston University)
SouYoung Jin (Dartmouth College)
Rameswar Panda (MIT-IBM Watson AI Lab)
Hilde Kuehne (Goethe University Frankfurt, MIT-IBM Waston AI Lab)
Hilde Kuehne

Prof. Dr. Hilde Kuehne is Head of Computer Vision and Machine Learning at the Computational Vision & Artificial Intelligence Group at the Goethe University Frankfurt and an affiliated professor at the MIT-IBM Watson AI Lab. Her research focuses on weakly and unsupervised recognition and understanding of video data. She obtained her doctoral degree in engineering from the Karlsruhe Institute of Technology (KIT) in 2014. Her experience includes projects with various European and US universities and international technology companies with a focus on image and video understanding processing. She has published various high-impact publications in the field, including the HMDB action classification dataset. She has organized various workshops in the field and served as area chair for CVPR, ICCV, and WACV. Beyond her work, she is committed to bringing more diversity to STEM.

Leonid Karlinsky (Weizmann Institute of Science)
Venkatesh Saligrama (Boston University)
Kate Saenko (Boston University & MIT-IBM Watson AI Lab, IBM Research)
Kate Saenko

Kate is an AI Research Scientist at FAIR, Meta and a Full Professor of Computer Science at Boston University (currently on leave) where she leads the Computer Vision and Learning Group. Kate received a PhD in EECS from MIT and did postdoctoral training at UC Berkeley and Harvard. Her research interests are in Artificial Intelligence with a focus on out-of-distribution learning, dataset bias, domain adaptation, vision and language understanding, and other topics in deep learning. Past academic positions Consulting professor at the MIT-IBM Watson AI Lab 2019-2022. Assistant Professor, Computer Science Department at UMass Lowell Postdoctoral Researcher, International Computer Science Institute Visiting Scholar, UC Berkeley EECS Visiting Postdoctoral Fellow, SEAS, Harvard University

Aude Oliva (Massachusetts Institute of Technology)
Rogerio Feris (MIT-IBM Watson AI Lab, IBM Research)

More from the Same Authors