Competition

Workshop for URGENT 2024 Challenge

Wangyou Zhang · Robin Scheibler · Kohei Saijo · Samuele Cornell · Chenda Li · Zhaoheng Ni · Anurag Kumar · Marvin Sach · Wei Wang · Yihui Fu · Shinji Watanabe · Tim Fingscheidt · Yanmin Qian

2024 Competition

Project Page [ OpenReview]

Abstract

Speech enhancement (SE) is the task of improving the quality of the desired speech while suppressing other interference signals.Tremendous progress has been achieved in the past decade in deep learning-based SE approaches.However, existing SE studies are often limited in one or multiple aspects of the following: coverage of SE sub-tasks, diversity and amount of data (especially real-world evaluation data), and diversity of evaluation metrics.As the first step to fill this gap, we establish a novel SE challenge, called URGENT, to promote research towards universal SE.It concentrates on the universality, robustness, and generalizability of SE approaches.In the challenge, we extend the conventionally narrow SE definition to cover different sub-tasks, thus allowing the exploration of the limits of current SE models.We start with four SE sub-tasks, including denoising, dereverberation, bandwidth extension, and declipping.Note that handling the above sub-tasks within a single SE model has been challenging and underexplored in the SE literature due to the distinct data formats in different tasks.As a result, most existing SE approaches are only designed for a specific subtask.To address this issue, we propose a technically novel framework to unify all these sub-tasks in a single model, which is compatible to most existing SE approaches.Several state-of-the-art baselines with different popular architectures have been provided for this challenge, including TF-GridNet, BSRNN, and Conv-TasNet.We also take care of the data diversity and amount by collecting abundant public speech and noise data from different domains.This allows for the construction of diverse training and evaluation data.Additional real recordings are further used for evaluating robustness and generalizability.Different from existing SE challenges, we adopt a wide range of evaluation metrics to provide comprehensive insights into the true capability of both generative and discriminative SE approaches.We expect this challenge would not only provide valuable insights into the current status of SE research, but also attract more research towards building universal SE models with strong robustness and good generalizability.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

1:30 PM

Opening Remarks

Samuele Cornell

Video

1:45 PM

Presentation from team 'Bytedance-SMT-Audio'

Xiaohuai Le

Video

2:00 PM

Presentation from team 'NJU-AALab'

Xiaobin Rong

Video

2:15 PM

Presentaiton from team 'NAVS'

Rong Chao

Video

2:30 PM

Invited talk: The Journey Towards Universal Perception: Experiments in Unsupervised, Multi-Task, Multi-Domain, Multi-Modal, and Multi-Channel Learning

John R. Hershey

Video

3:30 PM

Presentation from team 'ALPACA'

Seungu Han

Video

3:45 PM

Presentation from team 'Hamburgers'

Julius Richter

Video

4:00 PM

Closing Remarks

Video