Timezone: »

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Runzhe Yang · Xingyuan Sun · Karthik Narasimhan

Thu Dec 12 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #222

We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.

Author Information

Runzhe Yang (Princeton University)

I’m “Tony” Runzhe Yang (杨闰哲), currently a second-year Ph.D. student at Computer Science Department and Neuroscience Institute at Princeton University. Previously, I worked as a research intern at Cornell University. I received my Bachelor Degree in Computer Science from ACM Honors Class, Zhiyuan College, SJTU. As a junior researcher in the field of Artificial Intelligence, I am enthusiastic about all kinds of puzzles about human intelligence. My research interests include Reinforcement Learning, Deep Learning and Neuro-Inspired Machine Learning, Bayesian Inference and Graphical Models, Game Theory and Multi-Agent Systems, as well as their application in Dialogue Systems, Linguistics, Robotics and Scientific Discoveries, especially Neuroscience.

Xingyuan Sun (Princeton University)
Karthik Narasimhan (Princeton University)

More from the Same Authors