Timezone: »
Poster
Weakly-Supervised Audio-Visual Segmentation
Shentong Mo · Bhiksha Raj
Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotation, $\textit{i.e.}$, weakly-supervised audio-visual segmentation. We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation. Extensive experiments on AVSBench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios.
Author Information
Shentong Mo (CMU)
Bhiksha Raj (Carnegie Mellon University)
More from the Same Authors
-
2021 : Simulated Annealing for Neural Architecture Search »
Shentong Mo · Jingfei Xia · Pinxu Ren -
2021 : Adaptive Fine-tuning for Vision and Language Pre-trained Models »
Shentong Mo · Jingfei Xia · Ihor Markevych -
2021 : Multi-modal Self-supervised Pre-training for Large-scale Genome Data »
Shentong Mo · Xi Fu · Chenyang Hong · Yizhen Chen · Yuxuan Zheng · Xiangru Tang · Yanyan Lan · Zhiqiang Shen · Eric Xing -
2023 : Exploring Data Augmentations on Self-/Semi-/Fully- Supervised Pre-trained Models »
Shentong Mo · Zhun Sun · Chao Li -
2023 Poster: DiffComplete: Diffusion-based Generative 3D Shape Completion »
Ruihang Chu · Enze Xie · Shentong Mo · Zhenguo Li · Matthias Niessner · Chi-Wing Fu · Jiaya Jia -
2023 Poster: DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation »
Shentong Mo · Enze Xie · Ruihang Chu · Lanqing Hong · Matthias Niessner · Zhenguo Li -
2023 Poster: Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments »
Thanh-Dat Truong · Hoang-Quan Nguyen · Bhiksha Raj · Khoa Luu -
2023 Poster: PaintSeg: Painting Pixels for Training-free Segmentation »
Xiang Li · Chung-Ching Lin · Yinpeng Chen · Zicheng Liu · Jinglu Wang · Rita Singh · Bhiksha Raj -
2023 Poster: Training on Foveated Images Improves Robustness to Adversarial Attacks »
Muhammad Shah · Aqsa Kashaf · Bhiksha Raj -
2022 Poster: USB: A Unified Semi-supervised Learning Benchmark for Classification »
Yidong Wang · Hao Chen · Yue Fan · Wang SUN · Ran Tao · Wenxin Hou · Renjie Wang · Linyi Yang · Zhi Zhou · Lan-Zhe Guo · Heli Qi · Zhen Wu · Yu-Feng Li · Satoshi Nakamura · Wei Ye · Marios Savvides · Bhiksha Raj · Takahiro Shinozaki · Bernt Schiele · Jindong Wang · Xing Xie · Yue Zhang -
2022 Poster: Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing »
Shentong Mo · Yapeng Tian -
2022 Poster: A Closer Look at Weakly-Supervised Audio-Visual Source Localization »
Shentong Mo · Pedro Morgado -
2021 Workshop: 2nd Workshop on Self-Supervised Learning: Theory and Practice »
Pengtao Xie · Ishan Misra · Pulkit Agrawal · Abdelrahman Mohamed · Shentong Mo · Youwei Liang · Jeannette Bohg · Kristina N Toutanova -
2021 : Poster Session 2 (gather.town) »
Wenjie Li · Akhilesh Soni · Jinwuk Seok · Jianhao Ma · Jeffery Kline · Mathieu Tuli · Miaolan Xie · Robert Gower · Quanqi Hu · Matteo Cacciola · Yuanlu Bai · Boyue Li · Wenhao Zhan · Shentong Mo · Junhyung Lyle Kim · Sajad Fathi Hafshejani · Chris Junchi Li · Zhishuai Guo · Harshvardhan Harshvardhan · Neha Wadia · Tatjana Chavdarova · Difan Zou · Zixiang Chen · Aman Gupta · Jacques Chen · Betty Shea · Benoit Dherin · Aleksandr Beznosikov -
2021 : HEAR 2021: Holistic Evaluation of Audio Representations + Q&A »
Joseph Turian · Jordan Shier · Bhiksha Raj · Bjoern Schuller · Christian Steinmetz · George Tzanetakis · Gissel Velarde · Kirk McNally · Max Henry · Nicolas Pinto · Yonatan Bisk · George Tzanetakis · Camille Noufi · Dorien Herremans · Jesse Engel · Justin Salamon · Prany Manocha · Philippe Esling · Shinji Watanabe -
2020 Poster: Is normalization indispensable for training deep neural network? »
Jie Shao · Kai Hu · Changhu Wang · Xiangyang Xue · Bhiksha Raj -
2020 Oral: Is normalization indispensable for training deep neural network? »
Jie Shao · Kai Hu · Changhu Wang · Xiangyang Xue · Bhiksha Raj -
2019 Poster: Face Reconstruction from Voice using Generative Adversarial Networks »
Yandong Wen · Bhiksha Raj · Rita Singh -
2017 : Poster Session Music and environmental sounds »
Oriol Nieto · Jordi Pons · Bhiksha Raj · Tycho Tax · Benjamin Elizalde · Juhan Nam · Anurag Kumar -
2012 Poster: Unsupervised Structure Discovery for Semantic Analysis of Audio »
Sourish Chaudhuri · Bhiksha Raj -
2010 Poster: Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers »
Manas A Pathak · Shantanu Rane · Bhiksha Raj -
2009 Poster: A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds »
Paris Smaragdis · Madhusudana Shashanka · Bhiksha Raj