Timezone: »
One of the most widely used self-supervised speaker verification system training methods is to optimize the speaker embedding network in a discriminative fashion using clustering algorithm-driven pseudo-labels. Although the pseudo-label-based self-supervised training scheme showed impressive performance, recent studies have shown that label noise can significantly impact the performance. In this paper, we have explored various pseudo-labels driven by different clustering algorithms and conducted a fine-grained analysis of the relationship between the quality of the pseudo-labels and the speaker verification performance. From our experimental results, we shed light on several previously unexplored and overlooked aspects of the pseudo-labels that can have an impact on the speaker verification performance.Moreover, we could observe that the self-supervised speaker verification performance is heavily dependent on multiple qualitative aspects of the clustering algorithm that was used for generating the pseudo-labels. Furthermore, we show that the speaker verification performance can be severely degraded from overfitting to the noisy pseudo-labels and that the mixup strategy can mitigate the memorization effects of label noise.
Author Information
Abderrahim Fathan (Computer Research Institute of Montreal (CRIM), Montreal, Quebec, Canada)
Abderrahim Fathan received the B.Eng. and master's degrees in engineering sciences from Institut Polytechnique de Grenoble, Grenoble, France, in 2014 and 2016, respectively, and the M.Eng. degree in computer science from Polytechnique de Montreal, Montreal, QC, Canada. He is currently with the Computer Research Institute, Montreal, as Intern Researcher and has been a Ph.D. student with Concordia University, Montreal, since 2020. His research interests include machine learning, representation learning, self-supervised learning, anti-spoofing, speaker and speech recognition, and signal processing
JAHANGIR ALAM (Computer Research Institute of Montreal (CRIM))
Woo Hyun Kang (Computer Research Institute of Montreal)
More from the Same Authors
-
2021 : A versatile and efficient approach to summarize speech into utterance-level representations »
Joao Monteiro · JAHANGIR ALAM · Tiago H Falk -
2021 : A versatile and efficient approach to summarize speech into utterance-level representations »
Joao Monteiro · JAHANGIR ALAM · Tiago H Falk -
2018 : Coffee break + posters 2 »
Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautam Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Benjamin Baer · JAHANGIR ALAM · Jamin Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · Joao Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp -
2018 : Coffee break + posters 1 »
Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jamin Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Benjamin Baer · Abelino Jimenez · Joao Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautam Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli