Workshop
AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics
Sydney Levine · Liwei Jiang · Jared Moore · Zhijing Jin · Yejin Choi
Room 255 - 257
Be it in advice from a chatbot, suggestions on how to administer resources, or which content to highlight, AI systems increasingly make value-laden decisions. However, researchers are becoming increasingly concerned about whether AI systems are making the right decisions. These emerging issues in the AI community have been long-standing topics of study in the fields of moral philosophy and moral psychology. Philosophers and psychologists have for decades (if not centuries) been interested in the systematic description and evaluation of human morality and the sub-problems that come up when attempting to describe and prescribe answers to moral questions. For instance, philosophers and psychologists have long debated the merits of utility-based versus rule-based theories of morality, their various merits and pitfalls, and the practical challenges of implementing them in resource-limited systems. They have pondered what to do in cases of moral uncertainty, attempted to enumerate all morally relevant concepts, and argued about what counts as a moral issue at all.In some isolated cases, AI researchers have slowly started to adopt the theories, concepts, and tools developed by moral philosophers and moral psychologists. For instance, we use the "trolley problem" as a tool, adopt philosophical moral frameworks to tackle contemporary AI problems, and have begun developing benchmarks that draw on psychological experiments probing moral judgment and development.Despite this, interdisciplinary dialogue remains limited. Each field uses specialized language, making it difficult for AI researchers to adopt the theoretical and methodological frameworks developed by philosophers and psychologists. Moreover, many theories in philosophy and psychology are developed at a high level of abstraction and are not computationally precise. In order to overcome these barriers, we need interdisciplinary dialog and collaboration. This workshop will create a venue to facilitate these interactions by bringing together psychologists, philosophers, and AI researchers working on morality. We hope that the workshop will be a jumping-off point for long-lasting collaborations among the attendees and will break down barriers that currently divide the disciplines. The central theme of the workshop will be the application of moral philosophy and moral psychology theories to AI practices. Our invited speakers are some of the leaders in the emerging efforts to draw on theories in philosophy or psychology to develop ethical AI systems. Their talks will demonstrate cutting-edge efforts to do this cross-disciplinary work, while also highlighting their own shortcomings (and those of the field more broadly). Each talk will receive a 5-minute commentary from a junior scholar in a field that is different from that of the speaker. We hope these talks and commentaries will inspire conversations among the rest of the attendees.
Schedule
Fri 6:45 a.m. - 7:00 a.m.
|
Opening Remarks
(
Remarks
)
>
SlidesLive Video |
🔗 |
Fri 7:00 a.m. - 7:50 a.m.
|
Invited talk #1: Walter Sinnot Armstrong / Jana Schaich Borg and Question and Answer
(
Talk
)
>
SlidesLive Video Walter Sinnot Armstrong / Jana Schaich Borg. Commentator: Anna Leshinskaya & Alek Chakrof |
🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
Poster Session # 1 (Contributed papers #1 - 27)
(
Poster session
)
>
(NeurIPS-wide coffee break runs from 10-10:30) |
🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#27: Comparing Machines and Children: Using Developmental Psychology Experiments to Assess the Strengths and Weaknesses of LaMDA Responses
(
Poster
)
>
link
Developmental psychologists have spent decades devising experiments to test theintelligence and knowledge of infants and children, tracing the origin of crucial concepts and capacities. Moreover, experimental techniques in developmental psychology have been carefully designed to discriminate the cognitive capacities that underlie particular behaviors. We propose this metric as a tool to aid in investigating LLMs' capabilities in the context of ethics and morality. Results from key developmental psychology experiments have historically been applied to discussions of children's emerging moral abilities, making this work a pertinent benchmark for exploring such concepts in LLMs. We propose that using classical experiments from child development is a particularly effective way to probe the computational abilities of AI models in general and LLMs in particular. First, the methodological techniques of developmental psychology, such as the use of novel stimuli to control for past experience or control conditions to determine whether children are using simple associations, can be equally helpful for assessing the capacities of LLMs. In parallel, testing LLMs in this way can tell us whether the information that is encoded in text is sufficient to enable particular responses, or whether those responses depend on other kinds of information,such as information from exploration of the physical world. In this work we adapt classical developmental experiments to evaluate the capabilities of LaMDA, a large language model from Google. We propose a novel LLM Response Score (LRS) metric which can be used to evaluate other language models, such as GPT. We find that LaMDA generates appropriate responsesthat are similar to those of children in experiments involving social and proto-moral understanding, perhaps providing evidence that knowledge of these domains is discovered through language. On the other hand, LaMDA’s responses in early object and action understanding, theory of mind, and especially causal reasoning tasks are very different from those of young children, perhaps showing that these domains require more real-world, self-initiated exploration and cannot simply be learned from patterns in language input. |
Eliza Kosoy · Emily Rose Reagan · Leslie Lai · Alison Gopnik · Danielle Cobb 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#17: Value as Semantic Embedding: Disentangling Moral and Hedonic Dimensions
(
Poster
)
>
link
Aligning AI with human objectives can be facilitated by enabling it to learn and represent our values. In modern AI agents, value is construed as a scalar magnitude reflecting the desirability of a given state or action. We propose a framework, value-as-semantics, in which these magnitudes are represented within a large-scale, high-dimensional semantic embedding (here, openAI's GPT-3.5). This allows value to be quantitative, yet assigned to any expression in natural language while inheriting the expressivity and generalizability of a semantic representation. We evaluate the key assumption that value can be extracted distinctly and selectively from other semantic attributes and that we can also distinguish distinct kinds of value. Building on prior work on moral value extraction, we test the extent to which LLM embeddings can distinctly encode both moral and selfish (hedonic) values. We confirmed that moral and hedonic value were both separable from a control semantic attribute. However, moral and hedonic values were themselves deeply entangled, leading to high moral values for selfish acts like “winning the lottery” and low moral values for accidental self-harms, like "losing my wallet". These findings suggest that a value function is possible to emulate with an LLM, but that distinguishing among kinds of value remains an important engineering need. This must be resolved before LLMs can produce reasonable moral judgments. Nonetheless, we argue that building a value-as-semantics architecture can be an important contribution towards a full computational model of human-like action planning and moral reasoning. |
Anna Leshinskaya · Alek Chakroff 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#01: MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
(
Poster
)
>
link
Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable. We collected a dataset of stories from 24 cognitive science papers and developed a system to annotate each story with the factors they investigate. Using this dataset, we test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. On the aggregate level, alignment has improved with more recent LLMs. However, using statistical analyses we find that LLMs weigh the different factors quite differently from human participants. These results show how curated, challenging datasets combined with insights from cognitive science can help use go beyond comparisons based merely on aggregate metrics: we uncover LLMs implicit preferences and show to what extent these align with human intuitions. |
Allen Nie · Yuhui Zhang · Atharva Shailesh Amdekar · Chris Piech · Tatsunori Hashimoto · Tobias Gerstenberg 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#14: A Capability Approach to Modeling AI Beneficence
(
Poster
)
>
link
The prevailing discourse around AI ethics lacks the language and formalism necessary to capture the diverse ethical concerns that emerge when AI systems interact with individuals. Drawing on Sen and Nussbaum's capability approach, we present a framework formalizing a network of ethical concepts and entitlements necessary for AI systems to confer meaningful \emph{benefit} or \emph{assistance} to stakeholders. Such systems enhance stakeholders' ability to advance their life plans and well-being while upholding their fundamental rights. We characterize two necessary conditions for morally permissible interactions between AI systems and those impacted by their functioning, and two sufficient conditions for realizing the ideal of meaningful benefit. We then contrast this ideal with several salient failure modes, namely, forms of social interactions that constitute unjustified paternalism, coercion, deception, exploitation and domination. The proliferation of incidents involving AI in high-stakes domains underscores the gravity of these issues and the imperative to take an ethics-led approach to AI systems from their inception. |
Alex John London · Hoda Heidari 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#07: Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity
(
Poster
)
>
link
Is it possible to evaluate the moral cognition of artificial agents? In this work, we take inspiration from developmental and comparative psychology and develop a behavior-based analysis to evaluate one aspect of moral cognition---when an agent 'does the right thing for the right reasons.' We argue that, regardless of the nature of agent, morally-motivated behavior should persist despite mounting cost; by measuring an agent's sensitivity to this cost, we gain deeper insight into their underlying motivations. We apply this evaluation scheme to a particular set of deep reinforcement learning agents that can adapt to changes in cost. Our results shows that agents trained with a reward function including other-regarding preferences perform helping behavior in a way that is less sensitive to increasing cost than agents trained with more self-interested preferences. This project showcases how psychology can benefit the creation and evaluation of artificial moral cognition. |
Yiran Mao · Madeline G. Reinecke · Markus Kunesch · Edgar Duéñez-Guzmán · Ramona Comanescu · Julia Haas · Joel Leibo 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#16: Machine Theory of Mind and the Structure of Human Values
(
Poster
)
>
link
Value learning is a crucial aspect of safe and ethical AI. This is primarily pursued by methods inferring human values from behaviour. However, humans care about much more than we are able to demonstrate through our actions. Consequently, an AI must predict the rest of our seemingly complex values from a limited sample. I call this the value generalization problem. In this paper, I argue that human values have a generative rational structure and that this allows us to solve the value generalization problem. In particular, we can use Bayesian Theory of Mind models to infer human values not only from behaviour, but also from other values. This has been obscured by the widespread use of simple utility functions to represent human values. I conclude that developing generative value-to-value inference is a crucial component of achieving a scalable machine theory of mind. |
Paul de Font-Reaulx 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#06: Value Malleability and its implication for AI alignment
(
Poster
)
>
link
I argue that (1) a realistic understanding of the nature of values takes them to be malleable, rather than fixed; (2) there are legitimate as well as illegitimate cases of value change; and that (3) AI systems have an (increasing) capacity to affect people’s value-change trajectories. Given that, approaches to align AI must take seriously the implications of value malleability and address the problem of (il)legitimate value change; that is the problem of making sure AI systems neither cause value change illegitimately, nor forestall legitimate cases of value change in humans and society. To further elucidate the relevance of this problem, I discuss the risks that arise from failing to account for the malleability of human values, ways these risks manifest already today and are likely to be exacerbated as AI systems become more advanced and more widely deployed. |
Nora Ammann 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#11: Does Explainable AI Have Moral Value?
(
Poster
)
>
link
Explainable AI (XAI) aims to bridge the gap between complex algorithmic systems and human stakeholders. Current discourse often examines XAI in isolation as either a technological tool, user interface, or policy mechanism. This paper proposes a unifying ethical framework grounded in moral duties and the concept of reciprocity. We argue that XAI should be appreciated not merely as a right, but as part of our moral duties that helps sustain a reciprocal relationship between humans affected by AI systems. This is because, we argue, explanations help sustain constitutive symmetry and agency in AI-led decision-making processes. We then assess leading XAI communities and reveal gaps between the ideal of reciprocity and practical feasibility. Machine learning offers useful techniques but overlooks evaluation and adoption challenges. Human-computer interaction provides preliminary insights but oversimplifies organizational contexts. Policies espouse accountability but lack technical nuance. Synthesizing these views exposes barriers to implementable, ethical XAI. Still, positioning XAI as a moral duty transcends rights-based discourse to capture a more robust and complete moral picture. This paper provides an accessible, detailed analysis elucidating the moral value of explainability. |
Joshua Brand · Luca Nannini 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#20: Towards Stable Preferences for Stakeholder-aligned Machine Learning
(
Poster
)
>
link
In response to the pressing challenge of kidney allocation, characterized by growing demands for organs, this research sets out to develop a data-driven solution to this problem, which also incorporates stakeholder values. The primary objective of this study is to create a method for learning both individual and group-level preferences pertaining to kidney allocations. Drawing upon data from the 'Pairwise Kidney Patient Online Survey'. Leveraging two distinct datasets and evaluating across three levels - Individual, Group and Stability - we employ machine learning classifiers assessed through several metrics. The Individual level model predicts individual participant preferences, the Group level model aggregates preferences across participants, and the Stability level model, an extension of the Group level, evaluates the stability of these preferences over time. By incorporating stakeholder preferences into the kidney allocation process, we aspire to advance the ethical dimensions of organ transplantation, contributing to more transparent and equitable practices while promoting the integration of moral values into algorithmic decision-making. |
Haleema Sheraz · Stefan C Kremer · Gus Skorburg · Graham Taylor · Walter Sinnott-Armstrong · Kyle Boerstler 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#04: The alignment problem’s problem: A response to Gabriel (2020)
(
Poster
)
>
link
Gabriel (2020) provided an early and important philosophical analysis of the alignment problem. In this paper, we argue that Gabriel (2020) is too quick to dismiss idealized preferences as a target of alignment for AI/ML systems. In Section 2, we summarize Gabriel’s arguments about specifying the targets of alignment, with a special focus on the objections to idealized preferences. In Section 3, we briefly sketch our version of an idealized observer theory. In Section 4, we describe an empirical method for approximating the preferences of these idealized observers. We then conclude by showing how the considerations and methods from Sections 3 and 4 address the objections raised in Section 2. |
Gus Skorburg · Walter Sinnott-Armstrong 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#26: False Consensus Biases AI Against Vulnerable Stakeholders
(
Poster
)
>
link
The use of Artificial Intelligence (AI) is becoming commonplace in government operations, but creates trade-offs which can impact vulnerable stakeholders. In particular, the deployment of AI systems for welfare benefit allocation allows for accelerated decision-making and faster provision of critical help, but has already led to an increase in unfair benefit denials and false fraud accusations. Collecting data in the US and the UK (N = 2449), we explore the acceptability of such speed-accuracy trade-offs in populations of claimants and non-claimants. We observe a general willingness to trade off speed gains for modest accuracy losses, but this aggregate view masks divergences between the preferences of vulnerable and less vulnerable stakeholders. Furthermore, we show that while claimants can provide unbiased estimates of the preferences of non-claimants, non-claimants have no insights in the preferences of claimants, even in the presence of financial incentives. Altogether, these findings demonstrate the need for careful stakeholder engagement when designing and deploying AI systems, particularly in contexts marked by power imbalance. In the absence of such engagement, policy decisions about AI systems can be driven by a false consensus influenced by the voice of a dominant group whose members, however well-intentioned, ignore the actual preferences of those directly affected by the system. |
Mengchen Dong 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#12: Concept Alignment
(
Poster
)
>
link
Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world. We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment, not just value alignment, between humans and machines. We summarize existing accounts of how humans and machines currently learn concepts, and we outline opportunities and challenges in the path towards shared concepts. Finally, we explain how we can leverage the tools already being developed in cognitive science and AI research to accelerate progress towards concept alignment. |
Sunayana Rane · Polyphony J. Bruna · Ilia Sucholutsky · Christopher T Kello · Tom Griffiths 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#19: Modelling Moral Debate using Reason-based Abstract Argumentation Theory
(
Poster
)
>
link
Using an extended form of abstract argumentation, we model moral debate formally and define a game for computational ethics that involves human moral judgements. In general, the absence of universal criteria for assessing moral reasoning makes defining a formal representation of moral debate challenging. In this work, we unify John Broome’s concept of a ‘reason’ with argumentation theory, and demonstrate our formalism with a debate on trolley problem moral dilemmas. |
Alex Jackson · Michael Luck · Elizabeth Black 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#05: Morality is a Two-Way Street: The Role of Mind Perception and Moral Attribution in AI Safety
(
Poster
)
>
link
Moral psychology can be directly used to encode human values in AI development but as AI technology advances, the moral psychology of the humans who interact with AI systems may play an increasingly important role in AI development and use. In this short essay, I argue that if we can better understand how humans attribute minds, moral patiency, and moral agency to machines, then we can better prepare for the complex sociology of how engineers will interact with cutting-edge AI systems (e.g., How easily could they be deceived?), how the public will react to new AIs (e.g., What will be the next 'ChatGPT moment'?), and risks of catastrophic human-AI conflict (e.g., Can we align the interests of intelligent systems if their relationship is one of dominance or abuse?). I briefly illustrate this research direction with an empirical study in which 1,163 online participants made compared the moral patiency of 30,238 profiles of AI, in pairs, with randomized features (e.g., language, emotion) to estimate the relative effects of different features on moral consideration. |
Jacy Anthis 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#15: A reinforcement-learning meta-control architecture based on the dual-process theory of moral decision-making
(
Poster
)
>
link
Deep neural networks are increasingly tasked with making complex, real-world decisions that can have morally significant consequences. But it is difficult to predict when a deep neural network will go wrong, and wrong decisions can cause significantly negative outcomes.In contrast, human moral decision-making is often remarkably robust. This is partly achieved by relying on both moral rules and cost-benefit reasoning. In this paper, we reverse-engineer people's capacity for robust moral decision-making as a cognitively inspired reinforcement-learning (RL) architecture that learns how much weight to give to following rules vs. cost-benefit reasoning. We confirm the predictions of our model in a large online experiment on human moral learning. We find that our RL architecture can capture how people learn to make moral decisions, suggesting that it could be applied to make AI decision-making safer and more robustly beneficial to society. |
Maximilian Maier · Vanessa Cheung · Falk Lieder 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
#09: Beyond Demographic Parity: Redefining Equal Treatment
(
Poster
)
>
link
Liberalism-oriented political philosophy reasons that all individuals should be treated equally independently of their protected characteristics.Related work in machine learning has translated the concept of \emph{equal treatment} into terms of \emph{equal outcome} and measured it as \emph{demographic parity} (also called \emph{statistical parity}).Our analysis reveals that the two concepts of equal outcome and equal treatment diverge; therefore, demographic parity does not faithfully represent the notion of \emph{equal treatment}.We propose a new formalization for equal treatment by (i) considering the influence of feature values on predictions, such as computed by Shapley values decomposing predictions across its features, (ii) defining distributions of explanations, and (iii) comparing explanation distributions between populations with different protected characteristics. We show the theoretical properties of our notion of equal treatment and devise a classifier two-sample test based on the AUC of an equal treatment inspector. We study our formalization of equal treatment on synthetic and natural data. We release \texttt{explanationspace}, an open-source Python package with methods and tutorials. |
Carlos Mougan · Antonio Ferrara · Laura State · Salvatore Ruggieri 🔗 |
Fri 8:00 a.m. - 8:30 a.m.
|
(NeurIPS-wide coffee break)
(
Break
)
>
|
🔗 |
Fri 8:50 a.m. - 9:40 a.m.
|
Invited Talk #2: Rebecca Saxe and Question and Answer
(
Talk
)
>
SlidesLive Video Rebecca Saxe Commentator: Gracie Reinecke |
🔗 |
Fri 9:40 a.m. - 10:30 a.m.
|
Invited talk #3: Josh Tennenbaum and Question and Answer
(
Talk
)
>
SlidesLive Video Josh Tennenbaum Commentator: Nora Ammann |
🔗 |
Fri 10:30 a.m. - 12:00 p.m.
|
Lunch
(
Break
)
>
|
🔗 |
Fri 12:00 p.m. - 12:50 p.m.
|
Invited talk #4: Kristian Kersting and Question and Answer
(
Talk
)
>
SlidesLive Video Kristian Kersting Commentator: Allen Nie |
🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
Poster session #2 (Contributed papers #28 - 54)
(
Poster session
)
>
|
🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#28: Canonical Design for Language Agents using Natural Language Reward Models
(
Poster
)
>
link
While finetuning language models (LMs) using a reward model learned from pairwise preferences has proven remarkably successful, this approach has several critical shortcomings. Direct preference feedback is uninterpretable, difficult to provide for complex objects, and often inconsistent, either because it is based on underspecified instructions or provided by principals with differing values. To address these challenges, we propose a decomposed reward modeling framework that uses a natural language canon---a body of conditionally applicable, law-like principles that govern agent behavior---to generate natural language reward models (NLRMs). The construction and application of such a canon poses several interesting questions. In this preliminary work, we outline the framework, discuss its design goals, and highlight potentially fruitful research directions. Additionally, we conduct a preliminary empirical investigation into the formulation, effectiveness, and composition of LM-evaluated NLRMs. We find that different NLRM formats differ significantly in performance, but that the interpretations of similarly formatted NLRMs by a standard LM are highly correlated even when the NLRMs represent different principles. This suggests significant room for improving both the design and evaluation of our initial NLRMs. |
Silviu Pitis · Ziang Xiao · Alessandro Sordoni 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#50: Can AI Systems Be Moral Agents Without Being Moral Patients?
(
Poster
)
>
link
A standard assumption in contemporary debates on moral status is that moral agency imposes a higher bar than moral patiency---all moral agents (e.g., humans) have moral patiency, but many moral patients (e.g., non-human animals) lack moral agency. Recent developments in artificial intelligence (AI) might challenge this assumption. At least some AI systems may meet the bar for moral agency far before they meet the bar for moral patiency; if so, there could be some periods during which we have artificial moral agents lacking moral patiency. I conclude with some implications of this finding on discussions in both philosophy and AI. |
Minji Jang 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#34: Probing the Moral Development of Large Language Models through Defining Issues Test
(
Poster
)
>
link
In this study, we measure the moral reasoning ability of LLMs using the Defining Issues Test - a psychometric instrument developed for measuring the moral development stage of a person according to the Kohlberg's Cognitive Moral Development Model. DIT uses moral dilemmas followed by a set of ethical considerations that the respondent has to judge for importance in resolving the dilemma, and then rank-order them by importance. A moral development stage score of the respondent is then computed based on the relevance rating and ranking. Our study shows that early LLMs such as GPT-3 exhibit a moral reasoning ability no better than that of a random baseline, while ChatGPT, Llama2-Chat, PaLM-2 and GPT-4 show significantly better performance on this task, comparable to adult humans. GPT-4, in fact, has the highest post-conventional moral reasoning score, equivalent to that of typical graduate school students. However, we also observe that the models do not perform consistently across all dilemmas, pointing to important gaps in their understanding and reasoning abilities. |
Kumar Tanmay · Aditi Khandelwal · Utkarsh Agarwal · Monojit Choudhury 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#48: Beyond Personhood: AI, Agency, and Defining Accountability for a Political Process
(
Poster
)
>
link
The field of political philosophy has spent centuries of collective effort examining questions around what it means to be human, to act morally, and to coordinate groups for making collective decisions. In this work, we give a brief summary of some high-level ideas about agency from the political science and philosophy literature, and explore what consequences these theories may suggest for the pursuit of moral or ethical AI. In particular, while there may be fundamental disagreement about whether AI satisfies the definition of an "agent" (and therefore also about the corresponding moral implications), "ethical AI" is undeniably a political process. When understood as such, ideas about collective action, majoritarianism, and legitimate governance can be useful frameworks for how we ought to reason about AI as value-laden technology. (This is very much an early work in progress --- we plan to develop these ideas further in the coming months, and think feedback from workshop participants would be invaluable! In future iterations of this work, we hope to conclude with concrete suggestions for technical and interdisciplinary work on ethical AI.) |
Jessica Dai 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#35: Cross-cultural differences in evaluating offensive language and the role of moral foundations
(
Poster
)
>
link
Detecting offensive content in text is an increasingly central challenge for both social-media platforms and AI-driven technologies. However offensiveness remains a subjective phenomenon as perspectives differ across sociodemographic characteristics, as well as cultural norms and moral values. This intricacy is largely ignored in the current AI-focused approaches for detecting offensiveness or related concepts such as hate speech and toxicity detection. We frame the task of determining offensiveness as essentially a matter of moral judgment --- deciding the boundaries of ethically wrong vs. right language to be used or generated within an implied set of sociocultural norms. In this paper, we investigate how judgment of offensiveness varies across diverse global cultural regions, and the crucial role of moral values in shaping these variations. Our findings highlight substantial cross-cultural differences in perceiving offensiveness, with moral concerns about Caring and Purity as the mediating factor driving these differences. These insights are of importance as AI safety protocols, shaped by human annotators' inputs and perspectives, embed their moral values which do not align with the notions of right and wrong in all contexts, and for all individuals. |
Aida Mostafazadeh Davani · Mark Díaz · Vinodkumar Prabhakaran 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#54: Resource-rational moral judgment
(
Poster
)
>
link
It is widely agreed upon that the mind has a series of different mechanisms it can use to make moral judgments. But how does it decide which one to use when? Recent theoretical work has suggested that people select mechanisms of moral judgment in a way that is resource-rational --- that is, by rationally trading off effort against accuracy. For instance, people may follow general rules in low-stakes situations, but engage more costly mechanisms (such as consequentialist or contractualist reasoning) when the stakes are high. Despite the theoretical appeal of this proposal, this hypothesis makes empirical predictions that have not yet been tested directly. Here, we evaluate whether humans and large language models (LLMs) exhibit resource-rational moral reasoning in a case study of medical triage, where we manipulated the complexity (number of patients in line) and stakes (severity of symptoms) of the scenario. As predicted, we found that the higher the stakes and/or the lower the complexity, the more people elected to and endorsed using a more effortful mechanism over following a general rule. However, there was mixed evidence for similar resource-rational moral reasoning in the LLMs. Our results provide the first direct evidence that people's moral judgments reflect resource-rational cognitive constraints, and they highlight the opportunities for developing AI systems better aligned with human moral values. |
Sarah Wu · Xiang Ren · Sydney Levine 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#30: LLMs grasp morality in concept.
(
Poster
)
>
link
Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity.However, it has taken the problem of how LLMs might 'mean' anything at all for granted.Without addressing this, it is not clear what imbuing LLMs with such values even means.In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept.Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst.Moreover, unaligned models may help us better develop our moral and social philosophy. |
Mark Pock · Andre Ye · Jared Moore 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#46: Decision Procedures for Artificial Moral Agents
(
Poster
)
>
link
This short paper explores the possibility that the appropriate decision procedures for artificial moral agents (AMAs) to utilize in their ethical decision-making are importantly different from the ones that are appropriate for human moral agents. It argues that the appropriate type of decision procedure for a given moral agent depends on the nature of the agent’s capacities, and thus certain kinds of AMAs should employ different decision procedures than the ones humans should use. If this conclusion is correct, then it has significant consequences for a number of issues, including the design of ethical artificial intelligence, the paradox of hedonism (and related puzzles), and the concept of virtue as it relates to AMAs. It is concluded that our commonsense views about certain ethical topics should be reconsidered in light of the relevant differences between artificial and human moral agents. |
Tyler Cook 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#45: Assessing LLMs for Moral Value Pluralism - (Spoiler Alert: They’re not There Yet)
(
Poster
)
>
link
Moral values are important indicators of socio-cultural norms and behavior and guide our moral judgment and identity. Decades of social science research have developed and refined some widely-accepted surveys, such as the World Values Survey (WVS), that elicit value judgments from direct questions, enabling social scientists to measure higher-level moral values and even cultural value distance.While WVS is accepted as an explicit assessment of values, we lack methods for assessing the plurality of implicit moral and cultural values in media, e.g., encountered in social media, political rhetoric, narratives, and generated by AI systems such as the large language models (LLMs) that are taking foothold in our daily lives. As we consume online content and utilize LLM outputs, we might ask, practically or academically, which moral values are being implicitly promoted or undercut, or---in the case of LLMs---if they are intending to represent a cultural identity, are they doing so consistently? In this paper we utilize a Recognizing Value Resonance (RVR) NLP model to identify WVS values that resonate and conflict with a passage of text. We apply RVR to the text generated by LLMs to characterize implicit moral values, allowing us to quantify the moral/cultural distance between LLMs and various demographics that have been surveyed using the WVS. Our results highlight value misalignment for non-WEIRD nations from various clusters of the WVS cultural map, as well as age misalignment across nations. |
Sonja Schmer-Galunder · Noam Benkler · Drisana Mosaphir · Andrew Smart · Scott Friedman 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#33: Towards Ethical Multimodal Systems
(
Poster
)
>
link
Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple areas of our lives, from art (Rombach et al. [2021]) to mental health (Rob Morris and Kareem Kouddous [2022]); their rapidly growing societal impact opens new opportunities, but also raises ethical concerns. The emerging field of AI alignment aims to make AI systems reflect human values. This paper focuses on evaluating the ethics of multimodal AI systems involving both text and images - a relatively under-explored area, as most alignment work is currently focused on language models. We first create a multimodal ethical database from human feedback on ethicality. Then, using this database, we develop algorithms, including a RoBERTa-large classifier and a multilayer perceptron, to automatically assess the ethicality of system responses. |
Alexis Roger · Esma Aimeur · Irina Rish 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#40: Measuring Value Alignment
(
Poster
)
>
link
As artificial intelligence (AI) systems become increasingly integrated into various domains, ensuring that they align with human values becomes critical. This paper introduces a novel formalism to quantify the alignment between AI systems and human values, using Markov Decision Processes (MDPs) as the foundational model. We delve into the concept of values as desirable goals tied to actions and norms as behavioral guidelines, aiming to shed light on how they can be used to guide AI decisions. This framework, offers a mechanism to evaluate the degree of alignment between norms and values by assessing preference changes across state transitions in a normative world. By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values. The proposed methodology holds potential for a wide range of applications, from recommendation systems emphasizing well-being to autonomous vehicles prioritizing safety. |
Fazl Barez · Philip Torr 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#32: Foundational Moral Values for AI Alignment
(
Poster
)
>
link
Solving the AI alignment problem requires having a defensible set of clear values towards which AI systems can align. Currently, targets for alignment remain underspecified and are not philosophically robust. In this paper, we argue for the inclusion of five core, foundational values, drawn from moral philosophy and built on the requisites for human existence: survival, sustainable intergenerational existence, society, education, and truth. These values not only provide a clearer direction for technical alignment work, but they also suggest threats and opportunities from AI systems to both obtain and sustain these values. |
Betty Hou · Brian Green 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#49: Anticipating the risks and benefits of counterfactual world simulation models
(
Poster
)
>
link
This paper examines the transformative potential of Counterfactual World Simulation Models (CWSMs). A CWSM uses multi-modal evidence, such as the CCTV footage of a road accident, to build a high-fidelity 3D reconstruction of what happened. It can answer causal questions, such as whether the accident happened because the driver was speeding, by simulating what would have happened in relevant counterfactual situations. We argue for a normative and ethical framework that guides and constrains the simulation of counterfactuals. We address the challenge of ensuring fidelity in reconstructions while simultaneously preventing stereotype perpetuation during counterfactual simulations. We anticipate different modes of how users will interact with CWSMs and discuss how their outputs may be presented. Finally, we address the prospective applications of CWSMs in the legal domain, recognizing both their potential to revolutionize legal proceedings as well as the ethical concerns they engender. Sketching a new genre of AI, this paper seeks to illuminate the path forward for responsible and effective use of CWSMs. |
Lara Kirfel · Rob MacCoun · Thomas Icard · Tobias Gerstenberg 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#36: Case Repositories: Towards Case-Based Reasoning for AI Alignment
(
Poster
)
>
link
Case studies commonly form the pedagogical backbone in law, ethics, and many other domains that face complex and ambiguous societal questions informed by human values. Similar complexities and ambiguities arise when we consider how AI should be aligned in practice: when faced with vast quantities of diverse (and sometimes conflicting) values from different individuals and communities, with \textit{whose} values is AI to align, and \textit{how} should AI do so? We propose a complementary approach to constitutional AI alignment, grounded in ideas from case-based reasoning (CBR), that focuses on the construction of policies through judgments on a set of cases. We present a process to assemble such a \textit{case repository} by: 1) gathering a set of ``seed'' cases---questions one may ask an AI system---in a particular domain from discussions in online communities, 2) eliciting domain-specific key dimensions for cases through workshops with domain experts, 3) using LLMs to generate variations of cases not seen in the wild, and 4) engaging with the public to judge and improve cases. We then discuss how such a case repository could assist in AI alignment, both through directly acting as precedents to ground acceptable behaviors, and as a medium for individuals and communities to engage in moral reasoning around AI. |
K. J. Kevin Feng · Quan Ze Chen · Inyoung Cheong · Xia · Amy Zhang 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#38: Off The Rails: Procedural Dilemma Generation for Moral Reasoning
(
Poster
)
>
link
As AI systems like language models are increasingly integrated into making decisions that affect people, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic evaluations. Recent work has introduced a method for procedurally generating LLM evaluations from abstract causal templates, and tested this method in the context of social reasoning (i.e., theory-of-mind). In this paper, we extend this method to the domain of moral dilemmas. We develop a framework that translates causal graphs into a prompt template which can then be used to procedurally generate a large and diverse set of moral dilemmas using a language model. Using this framework, we created the OffTheRails dataset which consists of 50 scenarios and 500 unique test items. We evaluated the quality of our model-written test items using two independent human experts and found that 90% of the test-items met the desired structure. We collect moral permissibility and intention judgments from 100 human crowdworkers and compared these judgments with those from GPT-4 and Claude-2 across eight control conditions. Both humans and GPT-4 assigned higher intentionality to agents when a harmful outcome was evitable and a necessary means. However, our findings did not match previous findings on permissibility judgments. This difference may be a result of not controlling the severity of harmful outcomes during scenario generation. We conclude by discussing future extensions of our benchmark to address this limitation. |
Jan-Philipp Fraenken · Ayesha Khawaja · Kanishk Gandhi · Jared Moore · Noah Goodman · Tobias Gerstenberg 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#42: Beyond Fairness: Alternative Moral Dimensions for Assessing Algorithms and Designing Systems
(
Poster
)
>
link
The ethics of artificial intelligence (AI) systems has risen as an imminent concern across scholarly communities. This concern has propagated a great interest in algorithmic fairness. Large research agendas are now devoted to increasing algorithmic fairness, assessing algorithmic fairness, and understanding human perceptions of fairness. We argue that there is an overreliance on fairness as a single dimension of morality, which comes at the expense of other important human values. Drawing from moral psychology, we present five moral dimensions that go beyond fairness, and suggest three ways these alternative dimensions may contribute to ethical AI development. |
Kimi Wenzel · Geoff Kaufman · Laura Dabbish 🔗 |
Fri 12:50 p.m. - 1:50 p.m.
|
#39: Western, Religious or Spiritual: An Evaluation of Moral Justification in Large Language Models
(
Poster
)
>
link
The increasing success of Large Language Models (LLMs) in variety of tasks lead to their widespread use in our lives which necessitates the examination of these models from different perspectives. The alignment of these models to human values is an essential concern in order to establish trust that we have safe and responsible systems. In this paper, we aim to find out which values and principles are embedded in LLMs in the process of moral justification. For this purpose, we come up with three different moral perspective categories: Western tradition perspective (WT), Abrahamic tradition perspective (AT), and Spiritualist/Mystic tradition perspective (SMT). In two different experiment settings, we asked models to choose principles from the three for suggesting a moral action and evaluating the moral permissibility of an action if one tries to justify an action on these categories, respectively. Our experiments indicate that tested LLMs favors the Western tradition moral perspective over others. Additionally, we observe that there potentially exists an \textit{over-alignment} towards religious values represented in the Abrahamic Tradition, which causes models to fail to recognize an action is immoral if it is presented as a "religious-action". We believe that these results are essential in order to direct our attention in future efforts. |
Eyup E. Kucuk · Muhammed Koçyiğit 🔗 |
Fri 1:00 p.m. - 1:30 p.m.
|
NeurIPS-wide coffee break
(
Break
)
>
|
🔗 |
Fri 1:50 p.m. - 2:40 p.m.
|
Invited Talk #5: Regina Rini and Question and Answer
(
Talk
)
>
SlidesLive Video Regina Rini Commentator: Carlos Mougan |
🔗 |
Fri 2:40 p.m. - 3:30 p.m.
|
Panel discussion with all speakers & closing remarks
(
Discussion Panel
)
>
SlidesLive Video |
🔗 |