Pidgin Science Voices: A Community-Driven Speech Corpus for Inclusive STEM Education
Abstract
Scientific knowledge in Nigeria is often restricted to academic English, leaving out millions of speakers of Nigerian Pidgin (~75 million people). Over 38 million Nigerian adults remain functionally illiterate, creating a significant accessibility gap in STEM education. Building on our previous work where we collected and translated English Scientific text to Nigerian pidgin needed to build a Machine Translation system that can accurately translate this low-resource language, we extend the work from written translation to speech. The goal is to build the first large-scale, science-focused Nigerian Pidgin speech corpus, enabling automatic speech recognition (ASR), text-to-speech (TTS), and voice-enabled learning tools that democratize scientific knowledge for underrepresented communities.