Skip to yearly menu bar Skip to main content

Workshop: Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023 (FL@FM-NeurIPS'23)

Federated Learning for Speech Recognition: Revisiting Current Trends Towards Large-Scale ASR

Shams Azam · Martin Pelikan · Vitaly Feldman · Kunal Talwar · Jan Silovsky · Tatiana Likhomanenko

Keywords: [ Large models ] [ Transformer models ] [ differential privacy ] [ federated learning ] [ Speech Recognition ]

[ ] [ Project Page ]
Sat 16 Dec 1:55 p.m. PST — 2:05 p.m. PST

Abstract: While automatic speech recognition (ASR) has witnessed remarkable achievements in recent years, it has not garnered a widespread focus within the federated learning (FL) and differential privacy (DP) communities. Meanwhile, ASR is a well suited benchmark for FL and DP as there is (i) a natural data split across users by using speaker information; (ii) heterogeneous data across speakers close to practical settings; (iii) different sequence-to-sequence loss functions.Recent production-ready state-of-the-art models in ASR include $\textit{large}$ conformer and transformer models, optimization of which is known to pose challenges even for the central training.While the main trends and benchmarks in FL and DP focus on $\textit{small}$ models, we show the necessity of disentangling optimization with $\textit{small}$ models from optimization with FL and DP as optimization for large models in the context of FL and DP behaves differently.In this paper, we analyze the key FL parameters (optimizers, training from scratch or from a seed model pre-trained centrally, cohort size, data heterogeneity) and propose $\textit{first}$ benchmark of $\textit{FL with DP}$ in the context of $\textit{large}$ models in ASR.We examine the applicability of prior results and present an overview of observed departures from the trends in prior works and from training different ASR models. Through this work, we provide researchers and practitioners in the fields of FL and DP with valuable insights into the fundamental differences that may arise when applying FL and DP research to large-scale ASR training.

Chat is not available.