Workshop: Workshop on Machine Learning Safety

Towards Defining Deception in Structural Causal Games

Francis Ward


Deceptive agents are a challenge for the safety, trustworthiness, and cooperation ofAI systems. We focus on the problem that agents might deceive in order to achievetheir goals. There are a number of existing definitions of deception in the literatureon game theory and symbolic AI, but there is no overarching theory of deceptionfor learning agents in games. We introduce a functional definition of deceptionin structural causal games, grounded in the philosophical literature. We presentseveral examples to establish that our formal definition captures philosophical andcommonsense desiderata for deception.

Chat is not available.