Timezone: »
The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives. In practice, an agent's preferences over the objectives may not be known apriori, and hence, we require policies that can generalize to arbitrary preferences at test time. In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. The key contributions of this work are two-fold. First, we introduce D4MORL, (D)atasets for MORL that are specifically designed for offline settings. It contains 1.8 million annotated demonstrations obtained by rolling out reference policies that optimize for randomly sampled preferences on 6 MuJoCo environments with 2-3 objectives each. Second, we propose Pareto-Efficient Decision Agents (PEDA), a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy. Empirically, we show that PEDA closely approximates the behavioral policy on the D4MORL benchmark and provides an excellent approximation of the Pareto-front with appropriate conditioning, as measured by the hypervolume and sparsity metrics.
Author Information
Baiting Zhu (University of California, Los Angeles)
Meihua Dang (University of California, Los Angeles)
Aditya Grover (University of California, Los Angeles)
More from the Same Authors
-
2022 : Conditioned Spatial Downscaling of Climate Variables »
Alex Hung · Evan Becker · Ted Zadouri · Aditya Grover -
2022 : Short-range forecasts of global precipitation using deep learning-augmented numerical weather prediction »
Manmeet Singh · Vaisakh SB · Nachiketa Acharya · Aditya Grover · Suryachandra A. Rao · Bipin Kumar · Zong-Liang Yang · Dev Niyogi -
2022 : Machine Learning for Predicting Climate Extremes »
Hritik Bansal · Shashank Goel · Tung Nguyen · Aditya Grover -
2022 : Generative Pretraining for Black-Box Optimization »
Siddarth Krishnamoorthy · Satvik Mashkaria · Aditya Grover -
2022 : ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning »
Tung Nguyen · Qinqing Zheng · Aditya Grover -
2022 : Pareto-Efficient Decision Agents for Offline Multi-Objective Reinforcement Learning »
Baiting Zhu · Meihua Dang · Aditya Grover -
2022 : Machine Learning for Predicting Climate Extremes »
Hritik Bansal · Shashank Goel · Tung Nguyen · Aditya Grover -
2022 Poster: Masked Autoencoding for Scalable and Generalizable Decision Making »
Fangchen Liu · Hao Liu · Aditya Grover · Pieter Abbeel -
2022 Poster: Sparse Probabilistic Circuits via Pruning and Growing »
Meihua Dang · Anji Liu · Guy Van den Broeck -
2022 Poster: CyCLIP: Cyclic Contrastive Language-Image Pretraining »
Shashank Goel · Hritik Bansal · Sumit Bhatia · Ryan Rossi · Vishwa Vinay · Aditya Grover