Timezone: »
We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. Unlabeled video has the advantage that it can be economically acquired at massive scales, yet contains useful signals about natural sound. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual recognition models into the sound modality using unlabeled video as a bridge. Our sound representation yields significant performance improvements over the state-of-the-art results on standard benchmarks for acoustic scene/object classification. Visualizations suggest some high-level semantics automatically emerge in the sound network, even though it is trained without ground truth labels.
Author Information
Yusuf Aytar (MIT)
Carl Vondrick (MIT)
Antonio Torralba (MIT)
More from the Same Authors
-
2022 Poster: Procedural Image Programs for Representation Learning »
Manel Baradad · Richard Chen · Jonas Wulff · Tongzhou Wang · Rogerio Feris · Antonio Torralba · Phillip Isola -
2022 Poster: Learning Neural Acoustic Fields »
Andrew Luo · Yilun Du · Michael Tarr · Josh Tenenbaum · Antonio Torralba · Chuang Gan -
2022 Poster: Pre-Trained Language Models for Interactive Decision-Making »
Shuang Li · Xavier Puig · Chris Paxton · Yilun Du · Clinton Wang · Linxi Fan · Tao Chen · De-An Huang · Ekin Akyürek · Anima Anandkumar · Jacob Andreas · Igor Mordatch · Antonio Torralba · Yuke Zhu -
2022 Poster: ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment »
Joseph DelPreto · Chao Liu · Yiyue Luo · Michael Foshey · Yunzhu Li · Antonio Torralba · Wojciech Matusik · Daniela Rus -
2020 Poster: Debiased Contrastive Learning »
Ching-Yao Chuang · Joshua Robinson · Yen-Chen Lin · Antonio Torralba · Stefanie Jegelka -
2020 Spotlight: Debiased Contrastive Learning »
Ching-Yao Chuang · Joshua Robinson · Yen-Chen Lin · Antonio Torralba · Stefanie Jegelka -
2018 : Panel Discussion »
Antonio Torralba · Douwe Kiela · Barbara Landau · Angeliki Lazaridou · Joyce Chai · Christopher Manning · Stevan Harnad · Roozbeh Mottaghi -
2018 : Antonio Torralba - Learning to See and Hear »
Antonio Torralba -
2018 Poster: Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding »
Kexin Yi · Jiajun Wu · Chuang Gan · Antonio Torralba · Pushmeet Kohli · Josh Tenenbaum -
2018 Poster: 3D-Aware Scene Manipulation via Inverse Graphics »
Shunyu Yao · Tzu Ming Hsu · Jun-Yan Zhu · Jiajun Wu · Antonio Torralba · Bill Freeman · Josh Tenenbaum -
2018 Spotlight: Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding »
Kexin Yi · Jiajun Wu · Chuang Gan · Antonio Torralba · Pushmeet Kohli · Josh Tenenbaum -
2016 Poster: Generating Videos with Scene Dynamics »
Carl Vondrick · Hamed Pirsiavash · Antonio Torralba -
2015 Poster: Skip-Thought Vectors »
Jamie Kiros · Yukun Zhu · Russ Salakhutdinov · Richard Zemel · Raquel Urtasun · Antonio Torralba · Sanja Fidler -
2015 Poster: Where are they looking? »
Adria Recasens · Aditya Khosla · Carl Vondrick · Antonio Torralba -
2015 Spotlight: Where are they looking? »
Adria Recasens · Aditya Khosla · Carl Vondrick · Antonio Torralba -
2015 Poster: Learning visual biases from human imagination »
Carl Vondrick · Hamed Pirsiavash · Aude Oliva · Antonio Torralba -
2011 Poster: Video Annotation and Tracking with Active Learning »
Carl Vondrick · Deva Ramanan