Affinity Workshop: Queer in AI Workshop 1

Are We There Yet? – Building an equitable future with low-resource and endangered language research

Milind Agarwal


A substantial majority of the world’s languages have no language technologies and NLP toolkits at all. With an increasing reliance on technology and the web, depriving people access to technology in their native language is indirectly causing a loss of language, culture, traditions, linguistic information, and a diminishing richness of the human experience. This harsh reality marks the 21st century as a pivotal time for researchers and engineers in NLP. As per linguists, nearly half of the world's 7000 languages will be extinct before the end of this very century. But what if the advances in natural language processing and computational linguistics could help us change course? There has been a wide range of efforts by research groups on low-resource and resource-poor languages for the purposes of machine translation, and on endangered languages for the purposes of documentation and preservation. But despite numerous efforts in the field, there is a lack of a clear sense of direction and a unified front to tackle this problem. This paper hopes to unravel the diverse computational efforts being undertaken for low-resource, resource-poor and endangered language research, the different data resource creation and extraction techniques, and modern deep learning and statistical models being used specifically for this domain

