Timezone: »
This tutorial will cover policy gradients methods in reinforcement learning, with a focus on understanding foundational ideas from an optimization perspective. We will discuss the properties of the policy objective, in terms of two critical properties for convergence rates when using stochastic gradient approaches: variance and curvature. We will explain how the policy objective can be a particularly difficult optimization problem, as it can have large flat regions and stochastic samples of the gradient can be very high variance. We will first explain how to use standard tools from optimization to reduce the variance of the gradient estimate, as well as techniques to mitigate curvature issues. We will then discuss optimization improvements that leverage more knowledge about the objective, including the Markov property and how to modify the state distribution for more coverage. We will discuss how standard Actor-Critic methods with (off-policy) data re-use provide RL-specific variance reduction approaches. We will then conclude with an overview of what is known theoretically about the policy objective, where we discuss the role of entropy-regularization and exploration for mitigating curvature issues. The tutorial website is here: Home (google.com)
Timetable: Nicolas - 40 minute presentation + 10 minute Q&A Martha - 40 minute presentation + 10 minute Q&A Sham - 40 minute presentation + 10 minute Q&A
Bio and timetable on the website:(https://sites.google.com/ualberta.ca/rlandoptimization-neurips2020/home)
Author Information
Sham M Kakade (Microsoft Research)
Martha White
Nicolas Le Roux (Google AI)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning »
Mon. Dec 7th 07:00 -- 09:30 PM Room
More from the Same Authors
-
2022 : Poly-S: Analyzing and Improving Polytropon for Data-Efficient Multi-Task Learning »
Lucas Page-Caccia · Edoardo Maria Ponti · Liyuan Liu · Matheus Pereira · Nicolas Le Roux · Alessandro Sordoni -
2022 : Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad Harikandeh · Mark Schmidt · Nicolas Le Roux -
2023 Poster: Multi-Head Adapter Routing for Cross-Task Generalization »
Lucas Page-Caccia · Edoardo Maria Ponti · Zhan Su · Matheus Pereira · Nicolas Le Roux · Alessandro Sordoni -
2023 Poster: Deep language networks: joint prompt training of stacked LLMs using variational inference »
Alessandro Sordoni · Eric Yuan · Marc-Alexandre Côté · Matheus Pereira · Adam Trischler · Ziang Xiao · Arian Hosseini · Friederike Niedtner · Nicolas Le Roux -
2023 Poster: Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees »
Sharan Vaswani · Amirreza Kazemi · Reza Babanezhad Harikandeh · Nicolas Le Roux -
2021 Poster: Continual Auxiliary Task Learning »
Matthew McLeod · Chunlok Lo · Matthew Schlegel · Andrew Jacobsen · Raksha Kumaraswamy · Martha White · Adam White -
2021 Poster: Structural Credit Assignment in Neural Networks using Reinforcement Learning »
Dhawal Gupta · Gabor Mihucz · Matthew Schlegel · James Kostas · Philip Thomas · Martha White -
2020 Poster: An operator view of policy gradient methods »
Dibya Ghosh · Marlos C. Machado · Nicolas Le Roux -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2019 Spotlight: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2018 : Poster Session 1 (note there are numerous missing names here, all papers appear in all poster sessions) »
Akhilesh Gotmare · Kenneth Holstein · Jan Brabec · Michal Uricar · Kaleigh Clary · Cynthia Rudin · Sam Witty · Andrew Ross · Shayne O'Brien · Babak Esmaeili · Jessica Forde · Massimo Caccia · Ali Emami · Scott Jordan · Bronwyn Woods · D. Sculley · Rebekah Overdorf · Nicolas Le Roux · Peter Henderson · Brandon Yang · Tzu-Yu Liu · David Jensen · Niccolo Dalmasso · Weitang Liu · Paul Marc TRICHELAIR · Jun Ki Lee · Akanksha Atrey · Matt Groh · Yotam Hechtlinger · Emma Tosch -
2017 Poster: Multi-view Matrix Factorization for Linear Dynamical System Estimation »
Mahdi Karami · Martha White · Dale Schuurmans · Csaba Szepesvari -
2013 Poster: When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity »
Anima Anandkumar · Daniel Hsu · Majid Janzamin · Sham M Kakade -
2012 Poster: Learning Mixtures of Tree Graphical Models »
Anima Anandkumar · Daniel Hsu · Furong Huang · Sham M Kakade -
2012 Poster: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu -
2012 Poster: A latent factor model for highly multi-relational data »
Rodolphe Jenatton · Nicolas Le Roux · Antoine Bordes · Guillaume R Obozinski -
2012 Spotlight: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2012 Poster: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2012 Oral: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2011 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · Adam Coates · Yann LeCun · Nicolas Le Roux · Andrew Y Ng -
2011 Poster: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Poster: Stochastic convex optimization with bandit feedback »
Alekh Agarwal · Dean P Foster · Daniel Hsu · Sham M Kakade · Sasha Rakhlin -
2011 Oral: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang -
2011 Poster: Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression »
Sham M Kakade · Adam Kalai · Varun Kanade · Ohad Shamir -
2010 Spotlight: Learning from Logged Implicit Exploration Data »
Alex Strehl · Lihong Li · John Langford · Sham M Kakade -
2010 Poster: Learning from Logged Implicit Exploration Data »
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade -
2009 Poster: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2009 Oral: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2008 Poster: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization »
Sham M Kakade · Karthik Sridharan · Ambuj Tewari -
2007 Poster: Learning the 2-D Topology of Images »
Nicolas Le Roux · Yoshua Bengio · Pascal Lamblin · Marc Joliveau · Balázs Kégl -
2007 Oral: The Price of Bandit Information for Online Optimization »
Varsha Dani · Thomas P Hayes · Sham M Kakade -
2007 Poster: The Price of Bandit Information for Online Optimization »
Varsha Dani · Thomas P Hayes · Sham M Kakade -
2007 Poster: Topmoumoute Online Natural Gradient Algorithm »
Nicolas Le Roux · Pierre-Antoine Manzagol · Yoshua Bengio