Timezone: »
We introduce the task of spatially localizing narrated interactions in videos. Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations. To achieve this goal, we propose a multilayer cross-modal attention network that enables effective optimization of a contrastive loss during training. We introduce a divided strategy that alternates between computing inter- and intra-modal attention across the visual and natural language modalities, which allows effective training via directly contrasting the two modalities' representations. We demonstrate the effectiveness of our approach by self-training on the HowTo100M instructional video dataset and evaluating on a newly collected dataset of localized described interactions in the YouCook2 dataset. We show that our approach outperforms alternative baselines, including shallow co-attention and full cross-modal attention. We also apply our approach to grounding phrases in images with weak supervision on Flickr30K and show that stacking multiple attention layers is effective and, when combined with a word-to-region loss, achieves state of the art on recall-at-one and pointing hand accuracies.
Author Information
Reuben Tan (Boston University)
Bryan Plummer (Boston University)
Kate Saenko (Boston University & MIT-IBM Watson AI Lab, IBM Research)
Hailin Jin (Adobe)
Bryan Russell (Intel Labs)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos »
Thu. Dec 9th 12:30 -- 02:00 AM Room
More from the Same Authors
-
2021 : Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation »
Aadarsh Sahoo · Rameswar Panda · Rogerio Feris · Kate Saenko · Abir Das -
2021 : Extending the WILDS Benchmark for Unsupervised Adaptation »
Shiori Sagawa · Pang Wei Koh · Tony Lee · Irena Gao · Sang Michael Xie · Kendrick Shen · Ananya Kumar · Weihua Hu · Michihiro Yasunaga · Henrik Marklund · Sara Beery · Ian Stavness · Jure Leskovec · Kate Saenko · Tatsunori Hashimoto · Sergey Levine · Chelsea Finn · Percy Liang -
2021 : Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency »
Samarth Mishra · Kate Saenko · Venkatesh Saligrama -
2022 : Fifteen-minute Competition Overview Video »
Kate Saenko · Samarth Mishra · Dina Bashkirova · Vitaly Ablavsky · Sarah Bargal · Rachel Lai · Piotr Teterwak · James Akl · Fadi Alladkani · Donghyun Kim · Berk Calli -
2022 Competition: VisDA 2022 Challenge: Sim2Real Domain Adaptation for Industrial Recycling »
Dina Bashkirova · Samarth Mishra · Piotr Teterwak · Donghyun Kim · Rachel Lai · Fadi Alladkani · James Akl · Vitaly Ablavsky · Sarah Bargal · Berk Calli · Kate Saenko -
2022 : Challenge Introduction »
Dina Bashkirova · Samarth Mishra · Piotr Teterwak · Donghyun Kim · Sarah Bargal · Diala Lteif · Kate Saenko -
2022 : Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark »
Vitali Petsiuk · Alexander E. Siemenn · Saisamrit Surbehera · Qi Qi Chin · Keith Tyser · Gregory Hunter · Arvind Raghavan · Yann Hicke · Bryan Plummer · Ori Kerret · Tonio Buonassisi · Kate Saenko · Armando Solar-Lezama · Iddo Drori -
2022 Poster: DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations »
Ximeng Sun · Ping Hu · Kate Saenko -
2022 Poster: Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing »
Nataniel Ruiz · Sarah Bargal · Cihang Xie · Kate Saenko · Stan Sclaroff -
2022 Poster: How Transferable are Video Representations Based on Synthetic Data? »
Yo-whan Kim · Samarth Mishra · SouYoung Jin · Rameswar Panda · Hilde Kuehne · Leonid Karlinsky · Venkatesh Saligrama · Kate Saenko · Aude Oliva · Rogerio Feris -
2022 Poster: FETA: Towards Specializing Foundational Models for Expert Task Applications »
Amit Alfassy · Assaf Arbelle · Oshri Halimi · Sivan Harary · Roei Herzig · Eli Schwartz · Rameswar Panda · Michele Dolfi · Christoph Auer · Peter Staar · Kate Saenko · Rogerio Feris · Leonid Karlinsky -
2021 Workshop: Distribution shifts: connecting methods and applications (DistShift) »
Shiori Sagawa · Pang Wei Koh · Fanny Yang · Hongseok Namkoong · Jiashi Feng · Kate Saenko · Percy Liang · Sarah Bird · Sergey Levine -
2021 Poster: OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization »
Kuniaki Saito · Donghyun Kim · Kate Saenko -
2021 Poster: A Multi-Implicit Neural Representation for Fonts »
Pradyumna Reddy · Zhifei Zhang · Zhaowen Wang · Matthew Fisher · Hailin Jin · Niloy Mitra -
2021 : VisDA21: Visual Domain Adaptation + Q&A »
Kate Saenko · Kuniaki Saito · Donghyun Kim · Samarth Mishra · Ben Usman · Piotr Teterwak · Dina Bashkirova · Dan Hendrycks -
2021 Poster: Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing »
Aadarsh Sahoo · Rutav Shah · Rameswar Panda · Kate Saenko · Abir Das -
2020 Poster: Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment »
Ben Usman · Avneesh Sud · Nick Dufour · Kate Saenko -
2020 Poster: Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation »
Ping Hu · Stan Sclaroff · Kate Saenko -
2020 Poster: Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction »
Tong He · John Collomosse · Hailin Jin · Stefano Soatto -
2020 Poster: Universal Domain Adaptation through Self Supervision »
Kuniaki Saito · Donghyun Kim · Stan Sclaroff · Kate Saenko -
2020 Poster: Auxiliary Task Reweighting for Minimum-data Learning »
Baifeng Shi · Judy Hoffman · Kate Saenko · Trevor Darrell · Huijuan Xu -
2020 Poster: AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning »
Ximeng Sun · Rameswar Panda · Rogerio Feris · Kate Saenko -
2019 Poster: Adversarial Self-Defense for Cycle-Consistent GANs »
Dina Bashkirova · Ben Usman · Kate Saenko -
2018 Poster: Speaker-Follower Models for Vision-and-Language Navigation »
Daniel Fried · Ronghang Hu · Volkan Cirik · Anna Rohrbach · Jacob Andreas · Louis-Philippe Morency · Taylor Berg-Kirkpatrick · Kate Saenko · Dan Klein · Trevor Darrell -
2016 : Invited Talk: Domain Adaption for Perception and Action (Kate Saenko, Boston University) »
Kate Saenko -
2015 Workshop: Transfer and Multi-Task Learning: Trends and New Perspectives »
Anastasia Pentina · Christoph Lampert · Sinno Jialin Pan · Mingsheng Long · Judy Hoffman · Baochen Sun · Kate Saenko