Timezone: »
Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at https://github.com/DTaoo/Discriminative-Sounding-Objects-Localization.
Author Information
Di Hu (Renmin University of China)
Rui Qian (Shanghai Jiao Tong University)
Minyue Jiang (Baidu Inc.)
Xiao Tan (Baidu Inc.)
Shilei Wen (BAIDU)
Errui Ding (Baidu Inc.)
Weiyao Lin (Shanghai Jiao Tong university)
Dejing Dou (Baidu)
More from the Same Authors
-
2022 Poster: Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement »
Yan Li · Xinjiang Lu · Yaqing Wang · Dejing Dou -
2022 Poster: Delving into Sequential Patches for Deepfake Detection »
Jiazhi Guan · Hang Zhou · Zhibin Hong · Errui Ding · Jingdong Wang · Chengbin Quan · Youjian Zhao -
2022 Poster: InterpretDL: Explaining Deep Models in PaddlePaddle »
Xuhong Li · Haoyi Xiong · Xingjian Li · Xuanyu Wu · Zeyu Chen · Dejing Dou -
2022 : A Closer Look at Novel Class Discovery from the Labeled Set »
ZIYUN LI · Jona Otholt · Ben Dai · Di Hu · Christoph Meinel · Haojin Yang -
2022 : SMILE: Sample-to-feature MIxup for Efficient Transfer LEarning »
Xingjian Li · Haoyi Xiong · Cheng-Zhong Xu · Dejing Dou -
2022 : A Simple Framework for Active Learning to Rank »
Qingzhong Wang · Haifang Li · Haoyi Xiong · Wen Wang · Jiang Bian · Yu Lu · Shuaiqiang Wang · zhicong cheng · Dawei Yin · Dejing Dou -
2022 : A Comparative Survey of Deep Active Learning »
Xueying Zhan · Qingzhong Wang · Kuan-Hao Huang · Haoyi Xiong · Dejing Dou · Antoni Chan -
2023 Poster: BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis »
Zelin Ni · Hang Yu · Shizhan Liu · Jianguo Li · Weiyao Lin -
2023 Poster: HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception »
junkun yuan · Zhang · Hao Zhou · Jian Wang · Zhongwei Qiu · Zhiyin Shao · Shaofeng Zhang · Sifan Long · Kun Kuang · Kun Yao · Junyu Han · Errui Ding · Lanfen Lin · Fei Wu · Jingdong Wang -
2022 Spotlight: Delving into Sequential Patches for Deepfake Detection »
Jiazhi Guan · Hang Zhou · Zhibin Hong · Errui Ding · Jingdong Wang · Chengbin Quan · Youjian Zhao -
2022 Spotlight: RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer »
Jian Wang · Chenhui Gou · Qiman Wu · Haocheng Feng · Junyu Han · Errui Ding · Jingdong Wang -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Spotlight: InterpretDL: Explaining Deep Models in PaddlePaddle »
Xuhong Li · Haoyi Xiong · Xingjian Li · Xuanyu Wu · Zeyu Chen · Dejing Dou -
2022 Spotlight: Lightning Talks 1A-1 »
Siba Smarak Panigrahi · Xuhong Li · Mikhail Usvyatsov · Shaohan Chen · Sohan Patnaik · Haoyi Xiong · Nikolaos V Sahinidis · Rafael Ballester-Ripoll · Chuanhou Gao · Xingjian Li · Konrad Schindler · Xuanyu Wu · Zeyu Chen · Dejing Dou -
2022 Poster: Spatial Pruned Sparse Convolution for Efficient 3D Object Detection »
Jianhui Liu · Yukang Chen · Xiaoqing Ye · Zhuotao Tian · Xiao Tan · Xiaojuan Qi -
2022 Poster: RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer »
Jian Wang · Chenhui Gou · Qiman Wu · Haocheng Feng · Junyu Han · Errui Ding · Jingdong Wang -
2022 Poster: AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control »
Yifan Zhang · Haiyan Jiang · Haojie Ren · Changliang Zou · Dejing Dou -
2022 Poster: Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning »
Yanpeng Sun · Qiang Chen · Xiangyu He · Jian Wang · Haocheng Feng · Junyu Han · Errui Ding · Jian Cheng · Zechao Li · Jingdong Wang -
2021 : [O6] Explaining Information Flow Inside Vision Transformers Using Markov Chain »
Tingyi Yuan · Xuhong Li · Haoyi Xiong · Dejing Dou -
2021 Poster: Dual-stream Network for Visual Recognition »
Mingyuan Mao · peng gao · Renrui Zhang · Honghui Zheng · Teli Ma · Yan Peng · Errui Ding · Baochang Zhang · Shumin Han -
2020 Poster: Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation »
Yuxi Li · Ning Xu · Jinlong Peng · John See · Weiyao Lin -
2018 Poster: Compact Generalized Non-local Network »
Kaiyu Yue · Ming Sun · Yuchen Yuan · Feng Zhou · Errui Ding · Fuxin Xu