Timezone: »

Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
Madeline Chantry · Shruti Vyas · Hamid Palangi · Yogesh Rawat · Vibhav Vineet

Wed Nov 30 09:00 AM -- 11:00 AM (PST) @ Hall J #1033

Joint visual and language modeling on large-scale datasets has recently shown good progress in multi-modal tasks when compared to single modal learning. However, robustness of these approaches against real-world perturbations has not been studied. In this work, we perform the first extensive robustness study of video-language models against various real-world perturbations. We focus on text-to-video retrieval and propose two large-scale benchmark datasets, MSRVTT-P and YouCook2-P, which utilize 90 different visual and 35 different text perturbations. The study reveals some interesting initial findings from the studied models: 1) models are more robust when text is perturbed versus when video is perturbed, 2) models that are pre-trained are more robust than those trained from scratch, 3) models attend more to scene and objects rather than motion and action. We hope this study will serve as a benchmark and guide future research in robust video-language learning. The benchmark introduced in this study along with the code and datasets is available at https://bit.ly/3CNOly4.

Author Information

Madeline Chantry (University of Central Florida)
Madeline Chantry

Passionate researcher with a specialization in Deep Learning and computer vision. Studied psychology and business at the University of Connecticut followed by a masters in data analytics from the University of Central Florida (UCF). Worked on big data and machine learning projects for several years in cyber-security. Currently focused on research as a Graduate Research Assistant in the Center for Research in Computer Vision at UCF. Research focus is in visual-language self-supervised deep learning models. Takes pride in the ability to learn quickly with self-taught programming and without any prior experience in computer science coursework, passed all graduate level course requirements to complete doctoral qualifiers in computer science.

Shruti Vyas (University of Central Florida)
Hamid Palangi (Microsoft Research)
Yogesh Rawat (University of Central Florida)
Vibhav Vineet (Microsoft Research)

More from the Same Authors