Many single-channel signal decomposition techniques rely on a low-rank factorization of a time-frequency transform. In particular, nonnegative matrix factorization (NMF) of the spectrogram -- the (power) magnitude of the short-time Fourier transform (STFT) -- has been considered in many audio applications. In this setting, NMF with the Itakura-Saito divergence was shown to underly a generative Gaussian composite model (GCM) of the STFT, a step forward from more empirical approaches based on ad-hoc transform and divergence specifications. Still, the GCM is not yet a generative model of the raw signal itself, but only of its STFT. The work presented in this paper fills in this ultimate gap by proposing a novel signal synthesis model with low-rank time-frequency structure. In particular, our new approach opens doors to multi-resolution representations, that were not possible in the traditional NMF setting. We describe two expectation-maximization algorithms for estimation in the new model and report audio signal processing results with music decomposition and speech enhancement.
Cédric Févotte (CNRS, University of Toulouse)
Cédric Févotte is a CNRS research director with the Institut de Recherche en Informatique de Toulouse (IRIT). Previously, he has been a CNRS researcher at Laboratoire Lagrange (Nice, 2013-2016) & Télécom ParisTech (2007-2013), a research engineer at Mist-Technologies (the startup that became Audionamix, 2006-2007) and a postdoc at University of Cambridge (2003-2006). He holds MEng and PhD degrees in EECS from École Centrale de Nantes. His research interests concern statistical signal processing and machine learning, with particular interests in matrix factorisation, representation learning, source separation and recommender systems. He is currently the principal investigator of the European Research Council (ERC) project FACTORY (New paradigms for latent factor estimation, 2016-2022, 2M€).
Matthieu Kowalski (Univ Paris-Sud)
More from the Same Authors
2021 Poster: Unbalanced Optimal Transport through Non-negative Penalized Linear Regression »
Laetitia Chapel · Rémi Flamary · Haoran Wu · Cédric Févotte · Gilles Gasso
2016 Poster: Optimal spectral transportation with application to music transcription »
Rémi Flamary · Cédric Févotte · Nicolas Courty · Valentin Emiya
2011 Poster: Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation »
Onur Dikmen · Cédric Févotte