Skip to yearly menu bar Skip to main content


Talk
in
Competition: Competition Track Day 1: Overviews + Breakout Sessions

HEAR 2021: Holistic Evaluation of Audio Representations + Q&A

Joseph Turian · Jordan Shier · Bhiksha Raj · Bjoern Schuller · Christian Steinmetz · George Tzanetakis · Gissel Velarde · Kirk McNally · Max Henry · Nicolas Pinto · Yonatan Bisk · George Tzanetakis · Camille Noufi · Dorien Herremans · Jesse Engel · Justin Salamon · Prany Manocha · Philippe Esling · Shinji Watanabe


Abstract:

Humans can infer a wide range of properties from a perceived sound, such as information about the source (e.g. what generated the sound? where is it coming from?), the information the sound conveys (this is a word that means X, this is a musical note in scale Y), and how it compares to other sounds (these two sounds come/don't come from the same source and are/aren't identical). Can any one learned representation do the same? The aim of this competition is to develop a general-purpose audio representation that provides a meaningful basis for learning in a wide variety of tasks and scenarios. We challenge participants with the following questions: Is it possible to develop a single representation that models all psychoacoustic phenomena? What approach best generalizes to a wide range of downstream audio tasks without fine-tuning? What audio representation allows researchers to formulate and solve novel and societally-valuable problems in simple, repeatable ways? We will evaluate audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, medical audio, and music. In the spirit of shared exchange, all participants must submit an audio embedding model, following a common API, that is general-purpose, open-source, and freely available to use.