Program Highlights »
Fri Dec 8th 08:00 AM -- 06:30 PM @ 201 A
Machine Learning for Audio Signal Processing (ML4Audio)
Hendrik Purwins · Bob L. Sturm · Mark Plumbley

Workshop Home Page

Abstracts and full papers:

Audio signal processing is currently undergoing a paradigm change, where data-driven machine learning is replacing hand-crafted feature design. This has led some to ask whether audio signal processing is still useful in the "era of machine learning." There are many challenges, new and old, including the interpretation of learned models in high dimensional spaces, problems associated with data-poor domains, adversarial examples, high computational requirements, and research driven by companies using large in-house datasets that is ultimately not reproducible.

ML4Audio aims to promote progress, systematization, understanding, and convergence of applying machine learning in the area of audio signal processing. Specifically, we are interested in work that demonstrates novel applications of machine learning techniques to audio data, as well as methodological considerations of merging machine learning with audio signal processing. We seek contributions in, but not limited to, the following topics:
- audio information retrieval using machine learning;
- audio synthesis with given contextual or musical constraints using machine learning;
- audio source separation using machine learning;
- audio transformations (e.g., sound morphing, style transfer) using machine learning;
- unsupervised learning, online learning, one-shot learning, reinforcement learning, and incremental learning for audio;
- applications/optimization of generative adversarial networks for audio;
- cognitively inspired machine learning models of sound cognition;
- mathematical foundations of machine learning for audio signal processing.

This workshop especially targets researchers, developers and musicians in academia and industry in the area of MIR, audio processing, hearing instruments, speech processing, musical HCI, musicology, music technology, music entertainment, and composition.

ML4Audio Organisation Committee:
Hendrik Purwins, Aalborg University Copenhagen, Denmark (
Bob L. Sturm, Queen Mary University of London, UK (
Mark Plumbley, University of Surrey, UK (

Program Committee:
Abeer Alwan (University of California, Los Angeles)
Jon Barker (University of Sheffield)
Sebastian Böck (Johannes Kepler University Linz)
Mads Græsbøll Christensen (Aalborg University)
Maximo Cobos (Universitat de Valencia)
Sander Dieleman (Google DeepMind)
Monika Dörfler (University of Vienna)
Shlomo Dubnov (UC San Diego)
Philippe Esling (IRCAM)
Cédric Févotte (IRIT)
Emilia Gómez (Universitat Pompeu Fabra)
Emanuël Habets (International Audio Labs Erlangen)
Jan Larsen (Danish Technical University)
Marco Marchini (Spotify)
Rafael Ramirez (Universitat Pompeu Fabra)
Gaël Richard (TELECOM ParisTech)
Fatemeh Saki (UT Dallas)
Sanjeev Satheesh (Baidu SVAIL)
Jan Schlüter (Austrian Research Institute for Artificial Intelligence)
Joan Serrà (Telefonica)
Malcolm Slaney (Google)
Emmanuel Vincent (INRIA Nancy)
Gerhard Widmer (Austrian Research Institute for Artificial Intelligence)
Tao Zhang (Starkey Hearing Technologies)

08:00 AM Overture (Talk)
Hendrik Purwins
08:15 AM Acoustic word embeddings for speech search (Invited Talk)
Karen Livescu
08:45 AM Learning Word Embeddings from Speech (Talk)
Jim Glass, Yu-An Chung
09:05 AM Multi-Speaker Localization Using Convolutional Neural Network Trained with Noise (Talk)
Soumitro Chakrabarty, Emanuël Habets
09:25 AM Adaptive Front-ends for End-to-end Source Separation (Talk)
Shrikant Venkataramani, Paris Smaragdis
09:45 AM Poster Session Speech: source separation, enhancement, recognition, synthesis (Coffee break and poster session)
Shuayb Zarar, Rasool Fakoor, Sri Harsha Dumpala, Minje Kim, Paris Smaragdis, Mohit Dubey, Jong Hwan Ko, Sakriani Sakti, Yuxuan Wang, Lijiang Guo, Garrett T Kenyon, Andros Tjandra, Tycho Tax, Younggun Lee
11:00 AM Learning and transforming sound for interactive musical applications (Invited Talk)
Marco Marchini
11:30 AM Compact Recurrent Neural Network based on Tensor Train for Polyphonic Music Modeling (Talk)
Sakriani Sakti
11:50 AM Singing Voice Separation using Generative Adversarial Networks (Talk)
Hyeong-seok Choi, Kyogu Lee
12:10 PM Audio Cover Song Identification using Convolutional Neural Network (Talk)
Sungkyun Chang, Kyogu Lee
12:30 PM Lunch Break (Break)
01:30 PM Polyphonic piano transcription using deep neural networks (Invited Talk)
Douglas Eck
02:00 PM Deep learning for music recommendation and generation (Invited Talk)
Sander Dieleman
02:30 PM Exploring Ad Effectiveness using Acoustic Features (Invited Talk)
Matt Prockup, Puya Vahabi
03:00 PM Poster Session Music and environmental sounds (Coffee break and poster session)
Oriol Nieto, Jordi Pons, Bhiksha Raj, Tycho Tax, Benjamin Elizalde, Juhan Nam, Anurag Kumar
04:00 PM Sight and sound (Invited Talk)
Bill Freeman
04:30 PM k-shot Learning of Acoustic Context (Talk)
Bert de Vries
04:50 PM Towards Learning Semantic Audio Representations from Unlabeled Data (Talk)
Aren Jansen
05:10 PM Cost-sensitive detection with variational autoencoders for environmental acoustic sensing (Talk)
Yunpeng Li, Stephen J Roberts
05:30 PM Break
05:45 PM Panel: Machine learning and audio signal processing: State of the art and future perspectives (Discussion Panel)
Sepp Hochreiter, Bo Li, Karen Livescu, Arindam Mandal, Oriol Nieto, Malcolm Slaney, Hendrik Purwins