Timezone: »
This tutorial will cover policy gradients methods in reinforcement learning, with a focus on understanding foundational ideas from an optimization perspective. We will discuss the properties of the policy objective, in terms of two critical properties for convergence rates when using stochastic gradient approaches: variance and curvature. We will explain how the policy objective can be a particularly difficult optimization problem, as it can have large flat regions and stochastic samples of the gradient can be very high variance. We will first explain how to use standard tools from optimization to reduce the variance of the gradient estimate, as well as techniques to mitigate curvature issues. We will then discuss optimization improvements that leverage more knowledge about the objective, including the Markov property and how to modify the state distribution for more coverage. We will discuss how standard Actor-Critic methods with (off-policy) data re-use provide RL-specific variance reduction approaches. We will then conclude with an overview of what is known theoretically about the policy objective, where we discuss the role of entropy-regularization and exploration for mitigating curvature issues.
Author Information
Sham M Kakade (Toyota Technological Institute)
Martha White
Nicolas Le Roux (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning Q&A »
Thu. Dec 10th 09:00 -- 09:50 PM Room
More from the Same Authors
-
2021 Poster: Continual Auxiliary Task Learning »
Matthew McLeod · Chunlok Lo · Matthew Schlegel · Andrew Jacobsen · Raksha Kumaraswamy · Martha White · Adam White -
2021 Poster: Structural Credit Assignment in Neural Networks using Reinforcement Learning »
Dhawal Gupta · Gabor Mihucz · Matthew Schlegel · James Kostas · Philip Thomas · Martha White -
2019 : Closing Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 Workshop: The Optimization Foundations of Reinforcement Learning »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Opening Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2017 Poster: Multi-view Matrix Factorization for Linear Dynamical System Estimation »
Mahdi Karami · Martha White · Dale Schuurmans · Csaba Szepesvari