Competition
Workshop for URGENT 2024 Challenge
Wangyou Zhang · Robin Scheibler · Kohei Saijo · Samuele Cornell · Chenda Li · Zhaoheng Ni · Anurag Kumar · Marvin Sach · Wei Wang · Yihui Fu · Shinji Watanabe · Tim Fingscheidt · Yanmin Qian
West Meeting Room 215, 216
Speech enhancement (SE) is the task of improving the quality of the desired speech while suppressing other interference signals.Tremendous progress has been achieved in the past decade in deep learning-based SE approaches.However, existing SE studies are often limited in one or multiple aspects of the following: coverage of SE sub-tasks, diversity and amount of data (especially real-world evaluation data), and diversity of evaluation metrics.As the first step to fill this gap, we establish a novel SE challenge, called URGENT, to promote research towards universal SE.It concentrates on the universality, robustness, and generalizability of SE approaches.In the challenge, we extend the conventionally narrow SE definition to cover different sub-tasks, thus allowing the exploration of the limits of current SE models.We start with four SE sub-tasks, including denoising, dereverberation, bandwidth extension, and declipping.Note that handling the above sub-tasks within a single SE model has been challenging and underexplored in the SE literature due to the distinct data formats in different tasks.As a result, most existing SE approaches are only designed for a specific subtask.To address this issue, we propose a technically novel framework to unify all these sub-tasks in a single model, which is compatible to most existing SE approaches.Several state-of-the-art baselines with different popular architectures have been provided for this challenge, including TF-GridNet, BSRNN, and Conv-TasNet.We also take care of the data diversity and amount by collecting abundant public speech and noise data from different domains.This allows for the construction of diverse training and evaluation data.Additional real recordings are further used for evaluating robustness and generalizability.Different from existing SE challenges, we adopt a wide range of evaluation metrics to provide comprehensive insights into the true capability of both generative and discriminative SE approaches.We expect this challenge would not only provide valuable insights into the current status of SE research, but also attract more research towards building universal SE models with strong robustness and good generalizability.
Live content is unavailable. Log in and register to view live content
Schedule
Sat 1:30 p.m. - 1:45 p.m.
|
Opening Remarks
(
Presentation
)
>
|
Samuele Cornell 🔗 |
Sat 1:45 p.m. - 2:00 p.m.
|
Presentation from team 'Bytedance-SMT-Audio'
(
Oral Presentation
)
>
SlidesLive Video |
Xiaohuai Le 🔗 |
Sat 2:00 p.m. - 2:15 p.m.
|
Presentation from team 'NJU-AALab'
(
Oral Presentation
)
>
SlidesLive Video |
Xiaobin Rong 🔗 |
Sat 2:15 p.m. - 2:30 p.m.
|
Presentaiton from team 'NAVS'
(
Oral Presentation
)
>
SlidesLive Video |
Rong Chao 🔗 |
Sat 2:30 p.m. - 3:30 p.m.
|
Invited talk
(
Oral Presentation
)
>
|
John R. Hershey 🔗 |
Sat 3:30 p.m. - 3:45 p.m.
|
Presentation from team 'ALPACA'
(
Oral Presentation
)
>
SlidesLive Video |
Seungu Han 🔗 |
Sat 3:45 p.m. - 4:00 p.m.
|
Presentation from team 'Hamburgers'
(
Oral Presentation
)
>
SlidesLive Video |
Julius Richter 🔗 |
Sat 4:00 p.m. - 4:10 p.m.
|
Closing Remarks
(
Closing Remarks
)
>
|
🔗 |