Skip to yearly menu bar Skip to main content

Workshop: Data Centric AI

Dialectal Voice : An Open-Source Voice Dataset and Automatic Speech Recognition model for Moroccan Arabic dialectal


Under-represented languages such as Moroccan Arabic dialectal or Darija as it is commonly known face a lack of open systems capable of understanding them. However, a growing need for these systems by Academia, private companies and public institutions is increasingly expressed in order to better improve the human experience and ensure good productivity. We present here an automatic voice recognition system resulting from Data Centric and Transfer Learning approaches for the construction of a voice database and a Speech Recognition model for the Darija.