Timezone: »
We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.
Author Information
Runzhe Yang (Princeton University)
I’m “Tony” Runzhe Yang (杨闰哲), currently a final-year Ph.D. student at Computer Science Department and Neuroscience Institute at Princeton University. Previously, I worked as a research intern at Google Brain, Simons Foundation, and Cornell University. I received my Bachelor Degree in Computer Science from ACM Honors Class, Zhiyuan College, SJTU. Research Interests: NLP/RL/NeuroAI
Xingyuan Sun (Princeton University)
Karthik Narasimhan (Princeton University)
More from the Same Authors
-
2021 Spotlight: Safe Reinforcement Learning with Natural Language Constraints »
Tsung-Yen Yang · Michael Y Hu · Yinlam Chow · Peter J. Ramadge · Karthik Narasimhan -
2022 Poster: DataMUX: Data Multiplexing for Neural Networks »
Vishvak Murahari · Carlos Jimenez · Runzhe Yang · Karthik Narasimhan -
2021 Poster: Safe Reinforcement Learning with Natural Language Constraints »
Tsung-Yen Yang · Michael Y Hu · Yinlam Chow · Peter J. Ramadge · Karthik Narasimhan -
2021 Poster: SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark »
Victor Zhong · Austin W. Hanjie · Sida Wang · Karthik Narasimhan · Luke Zettlemoyer -
2020 : Invited talk - Bringing Back Text Understanding into Text-based Games - Karthik Narasimhan »
Karthik Narasimhan -
2020 Poster: Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters »
Sulin Liu · Xingyuan Sun · Peter J. Ramadge · Ryan Adams -
2020 Poster: Multimodal Graph Networks for Compositional Generalization in Visual Question Answering »
Raeid Saqur · Karthik Narasimhan -
2020 Poster: Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation »
Zhiwei Deng · Karthik Narasimhan · Olga Russakovsky -
2018 : Harnessing the synergy between natural language and interactive learning »
Karthik Narasimhan