Datasets and Benchmarks: Dataset and Benchmark Poster Session 3

A Toolbox for Construction and Analysis of Speech Datasets

Evelina Bakhturina · Vitaly Lavrukhin · Boris Ginsburg

[ Abstract ] [ Chat [ Paper ]


Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on K{\"u}rzinger et al. work, and, to the best of our knowledge, the dataset exploration tool is the world's first open-source tool of this kind. We demonstrate how to apply these tools to create a Russian speech dataset and analyze existing speech datasets (Multilingual LibriSpeech, Mozilla Common Voice). The tools are open sourced as a part of the NeMo framework.

Chat is not available.