Timezone: »
We propose a self-supervised algorithm to learn representations from egocentric video data. Recently, significant efforts have been made to capture humans interacting with their own environments as they go about their daily activities. In result, several large egocentric datasets of interaction-rich multi-modal data have emerged. However, learning representations from videos can be challenging. First, given the uncurated nature of long-form continuous videos, learning effective representations require focusing on moments in time when interactions take place. Second, visual representations of daily activities should be sensitive to changes in the state of the environment. However, current successful multi-modal learning frameworks encourage representation invariance over time. To address these challenges, we leverage audio signals to identify moments of likely interactions which are conducive to better learning. We also propose a novel self-supervised objective that learns from audible state changes caused by interactions. We validate these contributions extensively on two large-scale egocentric datasets, EPIC-Kitchens-100 and the recently released Ego4D, and show improvements on several downstream tasks, including action recognition, long-term action anticipation, and object state change classification.
Author Information
Himangi Mittal (Carnegie Mellon University)
Pedro Morgado (University of Wisconsin - Madison)
Unnat Jain (Meta AI Research)
Abhinav Gupta (Facebook AI Research/CMU)
More from the Same Authors
-
2021 : RB2: Robotic Manipulation Benchmarking with a Twist »
Sudeep Dasari · Jianren Wang · Joyce Hong · Shikhar Bahl · Yixin Lin · Austin Wang · Abitha Thankaraj · Karanbir Chahal · Berk Calli · Saurabh Gupta · David Held · Lerrel Pinto · Deepak Pathak · Vikash Kumar · Abhinav Gupta -
2021 : KitchenShift: Evaluating Zero-Shot Generalization of Imitation-Based Policy Learning Under Domain Shifts »
Eliot Xing · Abhinav Gupta · Samantha Powers · Victoria Dean -
2022 : Multispectral Masked Autoencoder for Remote Sensing Representation Learning »
Yibing Wei · Zhicheng Yang · Hang Zhou · Mei Han · Pedro Morgado · Jui-Hsin Lai -
2022 : Hearing Touch: Using Contact Microphones for Robot Manipulation »
Shaden Alshammari · Victoria Dean · Tess Hellebrekers · Pedro Morgado · Abhinav Gupta -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Offline Reinforcement Learning on Real Robot with Realistic Data Sources »
Gaoyue Zhou · Liyiming Ke · Siddhartha Srinivasa · Abhinav Gupta · Aravind Rajeswaran · Vikash Kumar -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Offline Reinforcement Learning on Real Robot with Realistic Data Sources »
Gaoyue Zhou · Liyiming Ke · Siddhartha Srinivasa · Abhinav Gupta · Aravind Rajeswaran · Vikash Kumar -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 Poster: A Closer Look at Weakly-Supervised Audio-Visual Source Localization »
Shentong Mo · Pedro Morgado -
2021 Oral: Interesting Object, Curious Agent: Learning Task-Agnostic Exploration »
Simone Parisi · Victoria Dean · Deepak Pathak · Abhinav Gupta -
2021 Poster: Bridging the Imitation Gap by Adaptive Insubordination »
Luca Weihs · Unnat Jain · Iou-Jen Liu · Jordi Salvador · Svetlana Lazebnik · Aniruddha Kembhavi · Alex Schwing -
2021 Poster: No RL, No Simulation: Learning to Navigate without Navigating »
Meera Hahn · Devendra Singh Chaplot · Shubham Tulsiani · Mustafa Mukadam · James Rehg · Abhinav Gupta -
2021 Poster: Interesting Object, Curious Agent: Learning Task-Agnostic Exploration »
Simone Parisi · Victoria Dean · Deepak Pathak · Abhinav Gupta -
2020 : QA: Abhinav Gupta »
Abhinav Gupta -
2020 : Invited Talk: Abhinav Gupta »
Abhinav Gupta -
2020 Poster: Neural Dynamic Policies for End-to-End Sensorimotor Learning »
Shikhar Bahl · Mustafa Mukadam · Abhinav Gupta · Deepak Pathak -
2020 Poster: Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases »
Senthil Purushwalkam · Abhinav Gupta -
2020 Spotlight: Neural Dynamic Policies for End-to-End Sensorimotor Learning »
Shikhar Bahl · Mustafa Mukadam · Abhinav Gupta · Deepak Pathak -
2020 Poster: See, Hear, Explore: Curiosity via Audio-Visual Association »
Victoria Dean · Shubham Tulsiani · Abhinav Gupta -
2020 Poster: MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation »
Saim Wani · Shivansh Patel · Unnat Jain · Angel Chang · Manolis Savva -
2020 Poster: Object Goal Navigation using Goal-Oriented Semantic Exploration »
Devendra Singh Chaplot · Dhiraj Prakashchand Gandhi · Abhinav Gupta · Russ Salakhutdinov -
2019 : Poster session »
Candace Ross · Yassine Mrabet · Sanjay Subramanian · Geoffrey Cideron · Jesse Mu · Suvrat Bhooshan · Eda Okur Kavil · Jean-Benoit Delbrouck · Yen-Ling Kuo · Nicolas Lair · Gabriel Ilharco · T.S. Jayram · Alba MarĂa Herrera Palacio · Chihiro Fujiyama · Olivier Tieleman · Anna Potapenko · Guan-Lin Chao · Thomas Sutter · Olga Kovaleva · Farley Lai · Xin Wang · Vasu Sharma · Catalina Cangea · Nikhil Krishnaswamy · Yuta Tsuboi · Alexander Kuhnle · Khanh Nguyen · Dian Yu · Homagni Saha · Jiannan Xiang · Vijay Venkataraman · Ankita Kalra · Ning Xie · Derek Doran · Travis Goodwin · Asim Kadav · Shabnam Daghaghi · Jason Baldridge · Jialin Wu · Jingxiang Lin · Unnat Jain -
2019 Poster: Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller »
Pratyusha Sharma · Deepak Pathak · Abhinav Gupta -
2019 Poster: TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines »
Jingxiang Lin · Unnat Jain · Alex Schwing -
2018 Poster: Hardware Conditioned Policies for Multi-Robot Transfer Learning »
Tao Chen · Adithyavairavan Murali · Abhinav Gupta -
2018 Poster: Beyond Grids: Learning Graph Representations for Visual Recognition »
Yin Li · Abhinav Gupta -
2018 Poster: Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias »
Abhinav Gupta · Adithyavairavan Murali · Dhiraj Prakashchand Gandhi · Lerrel Pinto -
2016 : Invited Talk - Self Supervised Learning of Visual Representations »
Abhinav Gupta -
2016 : Abhinav Gupta »
Abhinav Gupta -
2016 : Abhinav Gupta »
Abhinav Gupta -
2013 Poster: Mid-level Visual Element Discovery as Discriminative Mode Seeking »
Carl Doersch · Abhinav Gupta · Alexei A Efros -
2010 Poster: Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces »
David C Lee · Abhinav Gupta · Martial Hebert · Takeo Kanade -
2008 Poster: A "Shape Aware" Model for semi-supervised Learning of Objects and its Context »
Abhinav Gupta · Jianbo Shi · Larry Davis -
2008 Spotlight: A "Shape Aware'' Model for semi-supervised Learning of Objects and its Context »
Abhinav Gupta · Jianbo Shi · Larry Davis