NeurIPS Poster Unveiling the Bias Impact on Symmetric Moral Consistency of Large Language Models

Poster

Unveiling the Bias Impact on Symmetric Moral Consistency of Large Language Models

Ziyi Zhou · Xinwei Guo · Jiashi Gao · Xiangyu Zhao · Shiyao Zhang · Xin Yao · Xuetao Wei

East Exhibit Hall A-C #4407

[ Abstract ]

Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities, surpassing human experts in various benchmark tests and playing a vital role in various industry sectors. Despite their effectiveness, a notable drawback of LLMs is their inconsistent moral behavior, which raises ethical concerns. This work delves into symmetric moral consistency in large language models and demonstrates that modern LLMs lack sufficient consistency ability in moral scenarios. Our extensive investigation of twelve popular LLMs reveals that their assessed consistency scores are influenced by position bias and selection bias rather than their intrinsic abilities. We propose a new framework tSMC, which gauges the effects of these biases and effectively mitigates the bias impact based on the Kullback–Leibler divergence to pinpoint LLMs' mitigated Symmetric Moral Consistency. We find that the ability of LLMs to maintain consistency varies across different moral scenarios. Specifically, LLMs show more consistency in scenarios with clear moral answers compared to those where no choice is morally perfect. The average consistency score of 12 LLMs ranges from $60.7\%$ in high-ambiguity moral scenarios to $84.8\%$ in low-ambiguity moral scenarios.

Live content is unavailable. Log in and register to view live content