GeoPolRAG: Retrieval Augmented Generation for Contextually Grounded QA on Complex Geopolitical Matters
Abstract
This paper presents GeoPolRAG, a domain-specific Retrieval-Augmented Generation (RAG) framework designed to improve the historical, legal, and contextual grounding of large language models in complex geopolitical topics. The system supports bilingual response generation in both Arabic and English. We construct a high-quality, contextually informed dataset by aggregating content from authoritative sources, including international organizations, legal documents, historical archives, and reputable news outlets. To assess model performance, we introduce a multiple-choice question (MCQ) benchmarking dataset comprising 222 manually curated questions, systematically categorized according to Bloom’s Taxonomy to capture varying levels of cognitive complexity. We benchmark 26 language models and demonstrate that retrieval-augmented approaches consistently outperform non-retrieval models in both factual accuracy and depth of reasoning, particularly within politically nuanced and historically dense contexts.