Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: Muslims in ML

Towards Understanding Speaker Identity Coding in Data-driven Speech Models

Gasser Elbanna · Fabio Catania · Satra Ghosh

Keywords: [ speaker identity coding ] [ speaker perception ] [ representational similarity ] [ Self-supervised learning ]


Abstract:

Speaker identity plays a significant role in human communication and is being increasingly used in societal applications, many through advances in machine learning. Representational spaces of current deep learning models, self-supervised models in particular, have shown significant performance in various speech-related tasks. In this work, we demonstrate that these representations are significantly better for speaker identification over acoustic representations. We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks. By evaluating speaker identification accuracy across acoustic, phonemic, prosodic, and linguistic variants, we report similarity between model performance and human identity perception. These empirical findings provide both enhanced interpretability to these representational spaces and also support using this family of models as candidates to study speaker identity perception in humans.

Chat is not available.