Towards Quantifying Bias in Large Language Models
Abstract
Bias and explainability are growing topics which are extremely important to help us understand large language models and how they perform. Understanding how these models work provides insights that aid in training them more efficiently and effectively, and in designing more factual and less ambiguous models. In this study we propose the use of parameter-efficient fine-tuning (PEFT) for measuring bias, which is both accessible and computationally affordable. We design two datasets with identical questions and contrasting young and old oriented answers. By designing experiments and analyzing them, we demonstrate the value of PEFT in measuring bias and helping us take a step in unveiling the black box nature of large language models. Our experiments across three models (Qwen 1.8B, Llama 7B, Yi 6B) demonstrated consistent bias patterns, with models typically converging faster on the old oriented dataset. Additionally, we validated the results using statistical tests to prove the robustness of our methodology. This approach could be valuable especially for models employed in sensitive domains such as law and healthcare where consistent logical reasoning regardless of demographics are essential.