Timezone: »
Policy optimization, which learns the policy of interest by maximizing the value function via large-scale optimization, lies at the heart of modern reinforcement learning (RL). In addition to value maximization, other practical considerations arise commonly as well, including the need to encourage exploration, and to ensure certain structural properties due to safety, resource and operational constraints. These considerations can often be accounted for by resorting to regularized RL. Focusing on an infinite-horizon discounted Markov decision process, we propose a generalized policy mirror descent (GPMD) algorithm for solving regularized RL. As a generalization of policy mirror descent [Lan, 2021], the proposed algorithm accommodates a general class of convex regularizers as well as a broad family of Bregman divergence in cognizant of the regularizer in use. We prove that our algorithm converges linearly over an entire range of learning rates, in a dimension-free fashion, to the global solution, even when the regularizer lacks strong convexity and smoothness. In addition, this fast convergence feature is provably stable in the face of inexact policy evaluation and imperfect policy updates. Numerical experiments are provided to confirm the applicability of GPMD.
Author Information
Wenhao Zhan (Princeton University)
Shicong Cen (Carnegie Mellon University)
Baihe Huang (Peking University)
Yuxin Chen (Princeton University)
Jason Lee (Princeton University)
Yuejie Chi (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence »
Dates n/a. Room
More from the Same Authors
-
2021 Spotlight: Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning »
Gen Li · Laixi Shi · Yuxin Chen · Yuantao Gu · Yuejie Chi -
2021 : DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization »
Boyue Li · Zhize Li · Yuejie Chi -
2021 : DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization »
Boyue Li · Zhize Li · Yuejie Chi -
2022 : A Multi-Token Coordinate Descent Method for Vertical Federated Learning »
Pedro Valdeira · Yuejie Chi · Claudia Soares · Joao Xavier -
2022 Poster: BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression »
Haoyu Zhao · Boyue Li · Zhize Li · Peter Richtarik · Yuejie Chi -
2022 Poster: Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model »
Gen Li · Yuejie Chi · Yuting Wei · Yuxin Chen -
2022 Poster: SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression »
Zhize Li · Haoyu Zhao · Boyue Li · Yuejie Chi -
2021 : Poster Session 2 (gather.town) »
Wenjie Li · Akhilesh Soni · Jinwuk Seok · Jianhao Ma · Jeffery Kline · Mathieu Tuli · Miaolan Xie · Robert Gower · Quanqi Hu · Matteo Cacciola · Yuanlu Bai · Boyue Li · Wenhao Zhan · Shentong Mo · Junhyung Lyle Kim · Sajad Fathi Hafshejani · Chris Junchi Li · Zhishuai Guo · Harshvardhan Harshvardhan · Neha Wadia · Tatjana Chavdarova · Difan Zou · Zixiang Chen · Aman Gupta · Jacques Chen · Betty Shea · Benoit Dherin · Aleksandr Beznosikov -
2021 : Contributed talks in Session 3 (Zoom) »
Oliver Hinder · Wenhao Zhan · Akhilesh Soni · Grigory Malinovsky · Boyue Li -
2021 Poster: Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization »
Shicong Cen · Yuting Wei · Yuejie Chi -
2021 Poster: Going Beyond Linear RL: Sample Efficient Neural Function Approximation »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning »
Gen Li · Laixi Shi · Yuxin Chen · Yuantao Gu · Yuejie Chi -
2021 Poster: Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting »
Gen Li · Yuxin Chen · Yuejie Chi · Yuantao Gu · Yuting Wei -
2021 Poster: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2020 Poster: Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model »
Gen Li · Yuting Wei · Yuejie Chi · Yuantao Gu · Yuxin Chen -
2020 Poster: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction »
Gen Li · Yuting Wei · Yuejie Chi · Yuantao Gu · Yuxin Chen -
2019 : Lunch break and poster »
Felix Sattler · Khaoula El Mekkaoui · Neta Shoham · Cheng Hong · Florian Hartmann · Boyue Li · Daliang Li · Sebastian Caldas Rivera · Jianyu Wang · Kartikeya Bhardwaj · Tribhuvanesh Orekondy · YAN KANG · Dashan Gao · Mingshu Cong · Xin Yao · Songtao Lu · JIAHUAN LUO · Shicong Cen · Peter Kairouz · Yihan Jiang · Tzu Ming Hsu · Aleksei Triastcyn · Yang Liu · Ahmed Khaled Ragab Bayoumi · Zhicong Liang · Boi Faltings · Seungwhan Moon · Suyi Li · Tao Fan · Tianchi Huang · Chunyan Miao · Hang Qi · Matthew Brown · Lucas Glass · Junpu Wang · Wei Chen · Radu Marculescu · tomer avidor · Xueyang Wu · Mingyi Hong · Ce Ju · John Rush · Ruixiao Zhang · Youchi ZHOU · Françoise Beaufays · Yingxuan Zhu · Lei Xia -
2019 Poster: Nonconvex Low-Rank Symmetric Tensor Completion from Noisy Data »
Changxiao Cai · Gen Li · H. Vincent Poor · Yuxin Chen