Developing a Non-Western Culture Aware RAG Application
Abstract
Most artificial intelligence (AI) models are trained predominantly on Western datasets, resulting in responses that often miss cultural nuances, misinterpret context, or provide inappropriate recommendations for non-Western users. Also, the language models are trained on English-heavy web data, which leads to weaker results for many non-Western languages and cultures. Multilingual benchmarks show strong English scores but clear drops elsewhere, indicating a systemic imbalance rather than isolated bugs [1]. Research in massively multilingual MT shows these gaps can shrink when models and data prioritize low-resource languages [2]. Recent research acknowledges cultural bias in AI systems [3, 4]. However, most solutions focus on post-hoc bias detection. They are not focused on considering the cultural knowledge while building the AI application. Our goal is to build an open-source production application that centers non-Western languages by developing a Retrieval-Augmented Generation (RAG) based web application that would be made freely available for everyone. RAG systems have demonstrated effectiveness in domain-specific applications [5, 6]. However, less work has been done on building RAG applications that are trained on non-Western culture data.