Timezone: »
Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies. Their primary drawback is that, by being local in nature, they fail to adequately explore the environment. In contrast, while model-based approaches and Q-learning can, at least in theory, directly handle exploration through the use of optimism, their ability to handle model misspecification and function approximation is far less evident. This work introduces the the POLICY COVER GUIDED POLICY GRADIENT (PC- PG) algorithm, which provably balances the exploration vs. exploitation tradeoff using an ensemble of learned policies (the policy cover). PC-PG enjoys polynomial sample complexity and run time for both tabular MDPs and, more generally, linear MDPs in an infinite dimensional RKHS. Furthermore, PC-PG also has strong guarantees under model misspecification that go beyond the standard worst case L infinity assumptions; these include approximation guarantees for state aggregation under an average case error assumption, along with guarantees under a more general assumption where the approximation error under distribution shift is controlled. We complement the theory with empirical evaluation across a variety of domains in both reward-free and reward-driven settings.
Author Information
Alekh Agarwal (Microsoft Research)
Mikael Henaff (Microsoft)
Sham Kakade (University of Washington & Microsoft Research)
Wen Sun (Cornell University)
More from the Same Authors
-
2022 : Provable Benefits of Representational Transfer in Reinforcement Learning »
Alekh Agarwal · Yuda Song · Kaiwen Wang · Mengdi Wang · Wen Sun · Xuezhou Zhang -
2022 Spotlight: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL »
Jinglin Chen · Aditya Modi · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal -
2022 Poster: Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity »
Alekh Agarwal · Tong Zhang -
2021 : Invited Speaker Panel »
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker -
2021 : Q&A for Sham Kakade »
Sham Kakade -
2021 : Generalization theory in Offline RL »
Sham Kakade -
2021 Poster: Bellman-consistent Pessimism for Offline Reinforcement Learning »
Tengyang Xie · Ching-An Cheng · Nan Jiang · Paul Mineiro · Alekh Agarwal -
2021 Poster: The Benefits of Implicit Regularization from SGD in Least Squares Problems »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Dean Foster · Sham Kakade -
2021 Poster: Robust and differentially private mean estimation »
Xiyang Liu · Weihao Kong · Sham Kakade · Sewoong Oh -
2021 Poster: An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap »
Yuanhao Wang · Ruosong Wang · Sham Kakade -
2021 Poster: Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage »
Jonathan Chang · Masatoshi Uehara · Dhruv Sreenivas · Rahul Kidambi · Wen Sun -
2021 Poster: Going Beyond Linear RL: Sample Efficient Neural Function Approximation »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes »
Aditya Kusupati · Matthew Wallingford · Vivek Ramanujan · Raghav Somani · Jae Sung Park · Krishna Pillutla · Prateek Jain · Sham Kakade · Ali Farhadi -
2021 Poster: Gone Fishing: Neural Active Learning with Fisher Embeddings »
Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Sham Kakade -
2021 Poster: MobILE: Model-Based Imitation Learning From Observation Alone »
Rahul Kidambi · Jonathan Chang · Wen Sun -
2021 Oral: Bellman-consistent Pessimism for Offline Reinforcement Learning »
Tengyang Xie · Ching-An Cheng · Nan Jiang · Paul Mineiro · Alekh Agarwal -
2021 Poster: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Oral: An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap »
Yuanhao Wang · Ruosong Wang · Sham Kakade -
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning Q&A »
Sham M Kakade · Martha White · Nicolas Le Roux -
2020 Poster: Robust Meta-learning for Mixed Linear Regression with Small Batches »
Weihao Kong · Raghav Somani · Sham Kakade · Sewoong Oh -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: Policy Improvement via Imitation of Multiple Oracles »
Ching-An Cheng · Andrey Kolobov · Alekh Agarwal -
2020 Spotlight: Policy Improvement via Imitation of Multiple Oracles »
Ching-An Cheng · Andrey Kolobov · Alekh Agarwal -
2020 Poster: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Alekh Agarwal · Sham Kakade · Akshay Krishnamurthy · Wen Sun -
2020 Poster: Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates »
Wenhao Luo · Wen Sun · Ashish Kapoor -
2020 Poster: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Spotlight: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Spotlight: Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates »
Wenhao Luo · Wen Sun · Ashish Kapoor -
2020 Oral: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Alekh Agarwal · Sham Kakade · Akshay Krishnamurthy · Wen Sun -
2020 Poster: Safe Reinforcement Learning via Curriculum Induction »
Matteo Turchetta · Andrey Kolobov · Shital Shah · Andreas Krause · Alekh Agarwal -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Poster: Provably Good Batch Reinforcement Learning Without Great Exploration »
Yao Liu · Adith Swaminathan · Alekh Agarwal · Emma Brunskill -
2020 Poster: Information Theoretic Regret Bounds for Online Nonlinear Control »
Sham Kakade · Akshay Krishnamurthy · Kendall Lowrey · Motoya Ohnishi · Wen Sun -
2020 Spotlight: Safe Reinforcement Learning via Curriculum Induction »
Matteo Turchetta · Andrey Kolobov · Shital Shah · Andreas Krause · Alekh Agarwal -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning »
Sham M Kakade · Martha White · Nicolas Le Roux -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : The Provable Effectiveness of Policy Gradient Methods in Reinforcement Learning »
Sham Kakade -
2019 Poster: The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares »
Rong Ge · Sham Kakade · Rahul Kidambi · Praneeth Netrapalli -
2019 Poster: Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting »
Aditya Grover · Jiaming Song · Ashish Kapoor · Kenneth Tran · Alekh Agarwal · Eric Horvitz · Stefano Ermon -
2019 Poster: Meta-Learning with Implicit Gradients »
Aravind Rajeswaran · Chelsea Finn · Sham Kakade · Sergey Levine -
2018 : Drew Bagnell / Wen Sun »
James Bagnell · Wen Sun -
2018 : Coffee Break and Poster Session I »
Pim de Haan · Bin Wang · Dequan Wang · Aadil Hayat · Ibrahim Sobh · Muhammad Asif Rana · Thibault Buhet · Nicholas Rhinehart · Arjun Sharma · Alex Bewley · Michael Kelly · Lionel Blondé · Ozgur S. Oguz · Vaibhav Viswanathan · Jeroen Vanbaar · Konrad Żołna · Negar Rostamzadeh · Rowan McAllister · Sanjay Thakur · Alexandros Kalousis · Chelsea Sidrane · Sujoy Paul · Daphne Chen · Michal Garmulewicz · Henryk Michalewski · Coline Devin · Hongyu Ren · Jiaming Song · Wen Sun · Hanzhang Hu · Wulong Liu · Emilie Wirbel -
2018 Poster: A Smoother Way to Train Structured Prediction Models »
Krishna Pillutla · Vincent Roulet · Sham Kakade · Zaid Harchaoui -
2018 Poster: Dual Policy Iteration »
Wen Sun · Geoffrey Gordon · Byron Boots · J. Bagnell -
2018 Poster: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2018 Spotlight: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2018 Poster: Provably Correct Automatic Sub-Differentiation for Qualified Programs »
Sham Kakade · Jason Lee -
2017 Workshop: OPT 2017: Optimization for Machine Learning »
Suvrit Sra · Sashank J. Reddi · Alekh Agarwal · Benjamin Recht -
2017 Poster: Off-policy evaluation for slate recommendation »
Adith Swaminathan · Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik · John Langford · Damien Jose · Imed Zitouni -
2017 Oral: Off-policy evaluation for slate recommendation »
Adith Swaminathan · Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik · John Langford · Damien Jose · Imed Zitouni -
2017 Poster: Learning Overcomplete HMMs »
Vatsal Sharan · Sham Kakade · Percy Liang · Gregory Valiant -
2017 Poster: Predictive-State Decoders: Encoding the Future into Recurrent Networks »
Arun Venkatraman · Nicholas Rhinehart · Wen Sun · Lerrel Pinto · Martial Hebert · Byron Boots · Kris Kitani · J. Bagnell -
2017 Poster: Towards Generalization and Simplicity in Continuous Control »
Aravind Rajeswaran · Kendall Lowrey · Emanuel Todorov · Sham Kakade -
2016 Poster: Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent »
Chi Jin · Sham Kakade · Praneeth Netrapalli -
2016 Demonstration: Project Malmo - Minecraft for AI Research »
Katja Hofmann · Matthew A Johnson · Fernando Diaz · Alekh Agarwal · Tim Hutton · David Bignell · Evelyne Viegas -
2016 Poster: Efficient Second Order Online Learning by Sketching »
Haipeng Luo · Alekh Agarwal · Nicolò Cesa-Bianchi · John Langford -
2016 Poster: Contextual semibandits via supervised learning oracles »
Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik -
2016 Poster: PAC Reinforcement Learning with Rich Observations »
Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2015 Workshop: Optimization for Machine Learning (OPT2015) »
Suvrit Sra · Alekh Agarwal · Leon Bottou · Sashank J. Reddi -
2015 Poster: Efficient and Parsimonious Agnostic Active Learning »
Tzu-Kuo Huang · Alekh Agarwal · Daniel Hsu · John Langford · Robert Schapire -
2015 Spotlight: Efficient and Parsimonious Agnostic Active Learning »
Tzu-Kuo Huang · Alekh Agarwal · Daniel Hsu · John Langford · Robert Schapire -
2015 Poster: Convergence Rates of Active Learning for Maximum Likelihood Estimation »
Kamalika Chaudhuri · Sham Kakade · Praneeth Netrapalli · Sujay Sanghavi -
2015 Poster: Super-Resolution Off the Grid »
Qingqing Huang · Sham Kakade -
2015 Poster: Fast Convergence of Regularized Learning in Games »
Vasilis Syrgkanis · Alekh Agarwal · Haipeng Luo · Robert Schapire -
2015 Oral: Fast Convergence of Regularized Learning in Games »
Vasilis Syrgkanis · Alekh Agarwal · Haipeng Luo · Robert Schapire -
2015 Spotlight: Super-Resolution Off the Grid »
Qingqing Huang · Sham Kakade -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: Scalable Non-linear Learning with Adaptive Polynomial Expansions »
Alekh Agarwal · Alina Beygelzimer · Daniel Hsu · John Langford · Matus J Telgarsky -
2013 Workshop: Learning Faster From Easy Data »
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck -
2013 Workshop: OPT2013: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal -
2013 Poster: When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity »
Anima Anandkumar · Daniel Hsu · Majid Janzamin · Sham M Kakade -
2012 Workshop: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal -
2012 Poster: Learning Mixtures of Tree Graphical Models »
Anima Anandkumar · Daniel Hsu · Furong Huang · Sham M Kakade -
2012 Poster: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu -
2012 Spotlight: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2012 Poster: Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions »
Alekh Agarwal · Sahand N Negahban · Martin J Wainwright -
2011 Workshop: Computational Trade-offs in Statistical Learning »
Alekh Agarwal · Sasha Rakhlin -
2011 Poster: Distributed Delayed Stochastic Optimization »
Alekh Agarwal · John Duchi -
2011 Poster: Stochastic convex optimization with bandit feedback »
Alekh Agarwal · Dean P Foster · Daniel Hsu · Sham M Kakade · Sasha Rakhlin -
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang -
2011 Poster: Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression »
Sham M Kakade · Adam Kalai · Varun Kanade · Ohad Shamir -
2010 Workshop: Learning on Cores, Clusters, and Clouds »
Alekh Agarwal · Lawrence Cayton · Ofer Dekel · John Duchi · John Langford -
2010 Spotlight: Learning from Logged Implicit Exploration Data »
Alex Strehl · Lihong Li · John Langford · Sham M Kakade -
2010 Spotlight: Distributed Dual Averaging In Networks »
John Duchi · Alekh Agarwal · Martin J Wainwright -
2010 Poster: Distributed Dual Averaging In Networks »
John Duchi · Alekh Agarwal · Martin J Wainwright -
2010 Poster: Learning from Logged Implicit Exploration Data »
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade -
2010 Oral: Fast global convergence rates of gradient methods for high-dimensional statistical recovery »
Alekh Agarwal · Sahand N Negahban · Martin J Wainwright -
2010 Poster: Fast global convergence rates of gradient methods for high-dimensional statistical recovery »
Alekh Agarwal · Sahand N Negahban · Martin J Wainwright -
2009 Poster: Information-theoretic lower bounds on the oracle complexity of convex optimization »
Alekh Agarwal · Peter Bartlett · Pradeep Ravikumar · Martin J Wainwright -
2009 Spotlight: Information-theoretic lower bounds on the oracle complexity of convex optimization »
Alekh Agarwal · Peter Bartlett · Pradeep Ravikumar · Martin J Wainwright -
2009 Poster: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2009 Oral: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2008 Poster: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization »
Sham M Kakade · Karthik Sridharan · Ambuj Tewari -
2007 Poster: An Analysis of Inference with the Universum »
Fabian H Sinz · Olivier Chapelle · Alekh Agarwal · Bernhard Schölkopf -
2007 Spotlight: An Analysis of Inference with the Universum »
Fabian H Sinz · Olivier Chapelle · Alekh Agarwal · Bernhard Schölkopf -
2007 Oral: The Price of Bandit Information for Online Optimization »
Varsha Dani · Thomas P Hayes · Sham M Kakade -
2007 Poster: The Price of Bandit Information for Online Optimization »
Varsha Dani · Thomas P Hayes · Sham M Kakade