NeurIPS Towards Defining Deception in Structural Causal Games

Poster
in
Workshop: Workshop on Machine Learning Safety

Towards Defining Deception in Structural Causal Games

Francis Ward

[ Abstract ]

[ Poster]

Abstract:

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation ofAI systems. We focus on the problem that agents might deceive in order to achievetheir goals. There are a number of existing definitions of deception in the literatureon game theory and symbolic AI, but there is no overarching theory of deceptionfor learning agents in games. We introduce a functional definition of deceptionin structural causal games, grounded in the philosophical literature. We presentseveral examples to establish that our formal definition captures philosophical andcommonsense desiderata for deception.

Chat is not available.

Poster in Workshop: Workshop on Machine Learning Safety

Towards Defining Deception in Structural Causal Games

Francis Ward

Poster
in
Workshop: Workshop on Machine Learning Safety