Workshop: Workshop on Machine Learning Safety

Formalizing the Problem of Side Effect Regularization

Alex Turner · Aseem Saxena · Prasad Tadepalli


AI objectives are often hard to specify properly. Some approaches tackle thisproblem by regularizing the AI’s side effects: Agents must weigh off “how muchof a mess they make” with an imperfectly specified proxy objective. We propose aformal criterion for side effect regularization via the assistance game framework[Shah et al., 2021]. In these games, the agent solves a partially observable Markovdecision process (POMDP) representing its uncertainty about the objective functionit should optimize. We consider the setting where the true objective is revealedto the agent at a later time step. We show that this POMDP is solved by tradingoff the proxy reward with the agent’s ability to achieve a range of future tasks.We empirically demonstrate the reasonableness of our problem formalization viaground-truth evaluation in two gridworld environments.

Chat is not available.