Timezone: »

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
Zhijing Jin · Sydney Levine · Fernando Gonzalez Adauto · Ojasv Kamal · Maarten Sap · Mrinmaya Sachan · Rada Mihalcea · Josh Tenenbaum · Bernhard Schölkopf

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #706

AI systems are becoming increasingly intertwined with human life. In order to effectively collaborate with humans and ensure safety, AI systems need to be able to understand, interpret and predict human moral judgments and decisions. Human moral judgments are often guided by rules, but not always. A central challenge for AI safety is capturing the flexibility of the human moral mind — the ability to determine when a rule should be broken, especially in novel or unusual situations. In this paper, we present a novel challenge set consisting of moral exception question answering (MoralExceptQA) of cases that involve potentially permissible moral exceptions – inspired by recent moral psychology studies. Using a state-of-the-art large language model (LLM) as a basis, we propose a novel moral chain of thought (MoralCoT) prompting strategy that combines the strengths of LLMs with theories of moral reasoning developed in cognitive science to predict human moral judgments. MoralCoT outperforms seven existing LLMs by 6.2% F1, suggesting that modeling human reasoning might be necessary to capture the flexibility of the human moral mind. We also conduct a detailed error analysis to suggest directions for future work to improve AI safety using MoralExceptQA. Our data is open-sourced at https://huggingface.co/datasets/feradauto/MoralExceptQA and code at https://github.com/feradauto/MoralCoT.

Author Information

Zhijing Jin (ETH Zürich)
Sydney Levine (MIT / Allen AI)
Fernando Gonzalez Adauto (ETH Zurich)
Ojasv Kamal (Indian Institute of Technology Kharagpur)
Ojasv Kamal

Final year Undergraduate @IIT Kharagpur Works in MoralNLP

Maarten Sap (Carnegie Mellon University)
Maarten Sap

Maarten Sap is an assistant professor in Carnegie Mellon University's Language Technologies Department (CMU LTI). His research focuses on making NLP systems socially intelligent, and understanding social inequality and bias in language. He has presented his work in top-tier NLP and AI conferences, receiving a best short paper nomination at ACL 2019 and a best paper award at the WeCNLP 2020 summit. His research has been covered in the New York Times, Forbes, Fortune, and Vox. Additionally, he and his team won the inaugural 2017 Amazon Alexa Prize, a social chatbot competition. Before joining CMU, he was a postdoc/young investigator at the Allen Institute for AI (AI2) on project MOSAIC. He received his PhD from the University of Washington's Paul G. Allen School of Computer Science & Engineering where he was advised by Yejin Choi and Noah Smith. In the past, he has interned at the Allen Institute for AI working on social commonsense reasoning, and at Microsoft Research working on deep learning models for understanding human cognition.

Mrinmaya Sachan (ETH Zurich)
Rada Mihalcea (University of Michigan)
Josh Tenenbaum (MIT)

Josh Tenenbaum is an Associate Professor of Computational Cognitive Science at MIT in the Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 1999, and was an Assistant Professor at Stanford University from 1999 to 2002. He studies learning and inference in humans and machines, with the twin goals of understanding human intelligence in computational terms and bringing computers closer to human capacities. He focuses on problems of inductive generalization from limited data -- learning concepts and word meanings, inferring causal relations or goals -- and learning abstract knowledge that supports these inductive leaps in the form of probabilistic generative models or 'intuitive theories'. He has also developed several novel machine learning methods inspired by human learning and perception, most notably Isomap, an approach to unsupervised learning of nonlinear manifolds in high-dimensional data. He has been Associate Editor for the journal Cognitive Science, has been active on program committees for the CogSci and NIPS conferences, and has co-organized a number of workshops, tutorials and summer schools in human and machine learning. Several of his papers have received outstanding paper awards or best student paper awards at the IEEE Computer Vision and Pattern Recognition (CVPR), NIPS, and Cognitive Science conferences. He is the recipient of the New Investigator Award from the Society for Mathematical Psychology (2005), the Early Investigator Award from the Society of Experimental Psychologists (2007), and the Distinguished Scientific Award for Early Career Contribution to Psychology (in the area of cognition and human learning) from the American Psychological Association (2008).

Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)

Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see www.kyb.tuebingen.mpg.de/~bs.

More from the Same Authors