Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Machine Learning Safety

The Reward Hypothesis is False

Joar Skalse · Alessandro Abate


Abstract:

The \emph{reward hypothesis} is the hypothesis that \enquote{all of what we mean by goals and purposes can be well thought of as the maximisation of the expected value of the cumulative sum of a received scalar signal}\citep{sutton2018reinforcement}.In this paper, we will argue that this hypothesis is false.We will look at three natural classes of reinforcement learning tasks (multi-objective reinforcement learning, risk-averse reinforcement learning, and modal reinforcement learning), and then prove mathematically that these tasks cannot be expressed using any scalar, Markovian reward function. We thus disprove the reward hypothesis by providing many examples of tasks which are both natural and intuitive to describe, but which are nonetheless impossible to express using reward functions.In the process, we provide necessary and sufficient conditions for when a multi-objective reinforcement learning problem can be reduced to ordinary, scalar reward reinforcement learning. We also call attention to a new class of reinforcement learning problems (namely those we call \enquote{modal} problems), which have so far not been given any systematic treatment in the reinforcement learning literature.

Chat is not available.