Many single-channel signal decomposition techniques rely on a low-rank factorization of a time-frequency transform. In particular, nonnegative matrix factorization (NMF) of the spectrogram -- the (power) magnitude of the short-time Fourier transform (STFT) -- has been considered in many audio applications. In this setting, NMF with the Itakura-Saito divergence was shown to underly a generative Gaussian composite model (GCM) of the STFT, a step forward from more empirical approaches based on ad-hoc transform and divergence specifications. Still, the GCM is not yet a generative model of the raw signal itself, but only of its STFT. The work presented in this paper fills in this ultimate gap by proposing a novel signal synthesis model with low-rank time-frequency structure. In particular, our new approach opens doors to multi-resolution representations, that were not possible in the traditional NMF setting. We describe two expectation-maximization algorithms for estimation in the new model and report audio signal processing results with music decomposition and speech enhancement.
Cédric Févotte (CNRS)
Cédric Févotte is a CNRS senior researcher at Institut de Recherche en Informatique de Toulouse (IRIT). Previously, he has been a CNRS researcher at Laboratoire Lagrange (Nice, 2013-2016) & at Télécom ParisTech (2007-2013), a research engineer at Mist-Technologies (the startup that became Audionamix, 2006-2007) and a postdoc at University of Cambridge (2003-2006). He holds MEng and PhD degrees in EECS from École Centrale de Nantes. His research interests concern statistical signal processing and machine learning, for inverse problems and source separation. He is a member of the IEEE Machine Learning for Signal Processing technical committee and an associate editor for the IEEE Transactions on Signal Processing. In 2014, he was the co-recipient of an IEEE Signal Processing Society Best Paper Award for his work on audio source separation using multichannel nonnegative matrix factorisation. He is the principal investigator of the ERC project FACTORY (New paradigms for latent factor estimation, 2016-2021).
Matthieu Kowalski (Univ Paris-Sud)
More from the Same Authors
2016 Poster: Optimal spectral transportation with application to music transcription »
Rémi Flamary · Cédric Févotte · Nicolas Courty · Valentin Emiya
2011 Poster: Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation »
Onur Dikmen · Cédric Févotte