Workshop: Workshop on Machine Learning Safety

Alignment as a Dynamic Process

Paul de Font-Reaulx


Most learning AIs today have exogenously given and fixed aims which they gradually learn to optimize for. It has been an assumption in alignment research that artificial general intelligences of the kind that could pose an X-risk would too. On this assumption, value alignment becomes the task of finding the right set of aims before we allow the agent to act. However, an agent can also have aims that fundamentally change during their lifetime. The task of aligning such agents is not one of specifying a set of aims, but of designing a meta-function that guides the agent’s developing aims to an equilibrium that produces behaviour aligned with our human values. If artificial general intelligences would possess such dynamic aims, then this has significant implications for the kind of alignment research we should conduct today. In this paper, I argue that there is a substantial probability that artificial general intelligences would have such dynamic aims, and in response I articulate an agenda for dynamic alignment research.

Chat is not available.