Timezone: »
Poster
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan
The classical theory of reinforcement learning (RL) has focused on tabular
and linear representations of value functions. Further progress hinges on
combining RL with modern function approximators such as kernel functions and
deep neural networks, and indeed there have been many empirical successes
that have exploited such combinations in large-scale applications. There
are profound challenges, however, in developing a theory to support this
enterprise, most notably the need to take into consideration the exploration-exploitation
tradeoff at the core of RL in conjunction with the computational and statistical
tradeoffs that arise in modern function-approximation-based learning systems.
We approach these challenges by studying an optimistic modification of the
least-squares value iteration algorithm, in the context of the action-value function
represented by a kernel function or an overparameterized neural network.
We establish both polynomial runtime complexity and polynomial sample complexity
for this algorithm, without additional assumptions on the data-generating model.
In particular, we prove that the algorithm incurs an $\tilde{\mathcal{O}}(\delta_{\cF}
H^2 \sqrt{T})$ regret, where $\delta_{\cF}$ characterizes the intrinsic complexity of the function class $\cF$, $H$ is the length of each episode, and $T$ is the total number of episodes. Our regret bounds are independent of the number of states,
a result which exhibits clearly the benefit of function approximation in RL.
Author Information
Zhuoran Yang (Princeton)
Chi Jin (Princeton University)
Zhaoran Wang (Northwestern University)
Mengdi Wang (Princeton University)
Mengdi Wang is interested in data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years).
Michael Jordan (UC Berkeley)
More from the Same Authors
-
2020 Poster: Projection Robust Wasserstein Distance and Riemannian Optimization »
Tianyi Lin · Chenyou Fan · Nhat Ho · Marco Cuturi · Michael Jordan -
2020 Poster: Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm »
Tianyi Lin · Nhat Ho · Xi Chen · Marco Cuturi · Michael Jordan -
2020 Poster: Generalized Leverage Score Sampling for Neural Networks »
Jason Lee · Ruoqi Shen · Zhao Song · Mengdi Wang · zheng Yu -
2020 Spotlight: Projection Robust Wasserstein Distance and Riemannian Optimization »
Tianyi Lin · Chenyou Fan · Nhat Ho · Marco Cuturi · Michael Jordan -
2020 Poster: Decision-Making with Auto-Encoding Variational Bayes »
Romain Lopez · Pierre Boyeau · Nir Yosef · Michael Jordan · Jeffrey Regier -
2020 Poster: High-Dimensional Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Mengdi Wang -
2020 Poster: Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework »
Wanxin Jin · Zhaoran Wang · Zhuoran Yang · Shaoshuai Mou -
2020 Poster: Transferable Calibration with Lower Bias and Variance in Domain Adaptation »
Ximei Wang · Mingsheng Long · Jianmin Wang · Michael Jordan -
2020 Poster: Robust Optimization for Fairness with Noisy Protected Groups »
Serena Wang · Wenshuo Guo · Harikrishna Narasimhan · Andrew Cotter · Maya Gupta · Michael Jordan -
2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Poster: Provably Efficient Neural GTD for Off-Policy Learning »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2020 Poster: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Poster: On the Theory of Transfer Learning: The Importance of Task Diversity »
Nilesh Tripuraneni · Michael Jordan · Chi Jin -
2020 Poster: Near-Optimal Reinforcement Learning with Self-Play »
Yu Bai · Chi Jin · Tiancheng Yu -
2020 Poster: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Poster: End-to-End Learning and Intervention in Games »
Jiayang Li · Jing Yu · Yu Nie · Zhaoran Wang -
2020 Spotlight: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Spotlight: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Poster: Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach »
Luofeng Liao · You-Lin Chen · Zhuoran Yang · Bo Dai · Mladen Kolar · Zhaoran Wang -
2020 Poster: Dynamic Regret of Policy Optimization in Non-Stationary Environments »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang · Qiaomin Xie -
2020 Poster: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss »
Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jieping Ye · Zhaoran Wang -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2019 Poster: Statistical-Computational Tradeoff in Single Index Models »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Transferable Normalization: Towards Improving Transferability of Deep Neural Networks »
Ximei Wang · Ying Jin · Mingsheng Long · Jianmin Wang · Michael Jordan -
2019 Poster: Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost »
Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang -
2019 Poster: Variance Reduced Policy Evaluation with Smooth Function Approximation »
Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang -
2019 Poster: Convergent Policy Optimization for Safe Reinforcement Learning »
Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang -
2019 Poster: State Aggregation Learning from Markov Transition Data »
Yaqi Duan · Tracy Ke · Mengdi Wang -
2019 Poster: Acceleration via Symplectic Discretization of High-Resolution Differential Equations »
Bin Shi · Simon Du · Weijie Su · Michael Jordan -
2019 Poster: Learning low-dimensional state embeddings and metastable clusters from time series data »
Yifan Sun · Yaqi Duan · Hao Gong · Mengdi Wang -
2018 Poster: Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization »
Minshuo Chen · Lin Yang · Mengdi Wang · Tuo Zhao -
2018 Poster: Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation »
Kush Bhatia · Aldo Pacchiano · Nicolas Flammarion · Peter Bartlett · Michael Jordan -
2018 Poster: Contrastive Learning from Pairwise Measurements »
Yi Chen · Zhuoran Yang · Yuchen Xie · Zhaoran Wang -
2018 Poster: Provable Gaussian Embedding with One Observation »
Ming Yu · Zhuoran Yang · Tuo Zhao · Mladen Kolar · Zhaoran Wang -
2018 Poster: Theoretical guarantees for EM under misspecified Gaussian mixture models »
Raaz Dwivedi · nhật Hồ · Koulik Khamaru · Martin Wainwright · Michael Jordan -
2018 Poster: Stochastic Cubic Regularization for Fast Nonconvex Optimization »
Nilesh Tripuraneni · Mitchell Stern · Chi Jin · Jeffrey Regier · Michael Jordan -
2018 Poster: On the Local Minima of the Empirical Risk »
Chi Jin · Lydia T. Liu · Rong Ge · Michael Jordan -
2018 Poster: Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2018 Spotlight: On the Local Minima of the Empirical Risk »
Chi Jin · Lydia T. Liu · Rong Ge · Michael Jordan -
2018 Oral: Stochastic Cubic Regularization for Fast Nonconvex Optimization »
Nilesh Tripuraneni · Mitchell Stern · Chi Jin · Jeffrey Regier · Michael Jordan -
2018 Poster: Is Q-Learning Provably Efficient? »
Chi Jin · Zeyuan Allen-Zhu · Sebastien Bubeck · Michael Jordan -
2018 Poster: Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model »
Aaron Sidford · Mengdi Wang · Xian Wu · Lin Yang · Yinyu Ye -
2018 Poster: Information Constraints on Auto-Encoding Variational Bayes »
Romain Lopez · Jeffrey Regier · Michael Jordan · Nir Yosef -
2018 Poster: Conditional Adversarial Domain Adaptation »
Mingsheng Long · ZHANGJIE CAO · Jianmin Wang · Michael Jordan -
2018 Poster: Generalized Zero-Shot Learning with Deep Calibration Network »
Shichen Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2017 Poster: Fast Black-box Variational Inference through Stochastic Trust-Region Optimization »
Jeffrey Regier · Michael Jordan · Jon McAuliffe -
2017 Poster: Online control of the false discovery rate with decaying memory »
Aaditya Ramdas · Fanny Yang · Martin Wainwright · Michael Jordan -
2017 Spotlight: Fast Black-box Variational Inference through Stochastic Trust-Region Optimization »
Jeffrey Regier · Michael Jordan · Jon McAuliffe -
2017 Oral: Online control of the false discovery rate with decaying memory »
Aaditya Ramdas · Fanny Yang · Martin Wainwright · Michael Jordan -
2017 Poster: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma »
Zhuoran Yang · Krishnakumar Balasubramanian · Zhaoran Wang · Han Liu -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Oral: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Poster: Non-convex Finite-Sum Optimization Via SCSG Methods »
Lihua Lei · Cheng Ju · Jianbo Chen · Michael Jordan -
2017 Poster: Kernel Feature Selection via Conditional Covariance Minimization »
Jianbo Chen · Mitchell Stern · Martin J Wainwright · Michael Jordan -
2016 Workshop: Advances in Approximate Bayesian Inference »
Tamara Broderick · Stephan Mandt · James McInerney · Dustin Tran · David Blei · Kevin Murphy · Andrew Gelman · Michael I Jordan -
2016 Poster: Cyclades: Conflict-free Asynchronous Machine Learning »
Xinghao Pan · Maximilian Lam · Stephen Tu · Dimitris Papailiopoulos · Ce Zhang · Michael Jordan · Kannan Ramchandran · Christopher Ré · Benjamin Recht -
2016 Poster: Accelerating Stochastic Composition Optimization »
Mengdi Wang · Ji Liu · Ethan Fang -
2016 Poster: Unsupervised Domain Adaptation with Residual Transfer Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan -
2016 Poster: Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences »
Chi Jin · Yuchen Zhang · Sivaraman Balakrishnan · Martin J Wainwright · Michael Jordan -
2015 Poster: Variational Consensus Monte Carlo »
Maxim Rabinovich · Elaine Angelino · Michael Jordan -
2015 Poster: On the Accuracy of Self-Normalized Log-Linear Models »
Jacob Andreas · Maxim Rabinovich · Michael Jordan · Dan Klein -
2015 Poster: Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes »
Ryan Giordano · Tamara Broderick · Michael Jordan -
2015 Spotlight: Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes »
Ryan Giordano · Tamara Broderick · Michael Jordan -
2014 Workshop: Advances in Variational Inference »
David Blei · Shakir Mohamed · Michael Jordan · Charles Blundell · Tamara Broderick · Matthew D. Hoffman -
2014 Poster: Communication-Efficient Distributed Dual Coordinate Ascent »
Martin Jaggi · Virginia Smith · Martin Takac · Jonathan Terhorst · Sanjay Krishnan · Thomas Hofmann · Michael Jordan -
2014 Poster: Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing »
Yuchen Zhang · Xi Chen · Denny Zhou · Michael Jordan -
2014 Poster: Parallel Double Greedy Submodular Maximization »
Xinghao Pan · Stefanie Jegelka · Joseph Gonzalez · Joseph K Bradley · Michael Jordan -
2014 Spotlight: Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing »
Yuchen Zhang · Xi Chen · Denny Zhou · Michael Jordan -
2014 Poster: On the Convergence Rate of Decomposable Submodular Function Minimization »
Robert Nishihara · Stefanie Jegelka · Michael Jordan -
2013 Workshop: Big Learning : Advances in Algorithms and Data Management »
Xinghao Pan · Haijie Gu · Joseph Gonzalez · Sameer Singh · Yucheng Low · Joseph Hellerstein · Derek G Murray · Raghu Ramakrishnan · Michael Jordan · Christopher Ré -
2013 Workshop: Discrete Optimization in Machine Learning: Connecting Theory and Practice »
Stefanie Jegelka · Andreas Krause · Pradeep Ravikumar · Kazuo Murota · Jeffrey A Bilmes · Yisong Yue · Michael Jordan -
2013 Session: Oral Session 10 »
Michael Jordan -
2013 Poster: A Comparative Framework for Preconditioned Lasso Algorithms »
Fabian L Wauthier · Nebojsa Jojic · Michael Jordan -
2013 Poster: Information-theoretic lower bounds for distributed statistical estimation with communication constraints »
Yuchen Zhang · John Duchi · Michael Jordan · Martin J Wainwright -
2013 Oral: Information-theoretic lower bounds for distributed statistical estimation with communication constraints »
Yuchen Zhang · John Duchi · Michael Jordan · Martin J Wainwright -
2013 Poster: Optimistic Concurrency Control for Distributed Unsupervised Learning »
Xinghao Pan · Joseph Gonzalez · Stefanie Jegelka · Tamara Broderick · Michael Jordan -
2013 Poster: Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation »
John Duchi · Martin J Wainwright · Michael Jordan -
2013 Poster: Streaming Variational Bayes »
Tamara Broderick · Nicholas Boyd · Andre Wibisono · Ashia C Wilson · Michael Jordan -
2013 Poster: Estimation, Optimization, and Parallelism when Data is Sparse »
John Duchi · Michael Jordan · Brendan McMahan -
2012 Workshop: Bayesian Nonparametric Models For Reliable Planning And Decision-Making Under Uncertainty »
Jonathan How · Lawrence Carin · John Fisher III · Michael Jordan · Alborz Geramifard -
2012 Poster: Privacy Aware Learning »
John Duchi · Michael Jordan · Martin J Wainwright -
2012 Poster: Ancestor Sampling for Particle Gibbs »
Fredrik Lindsten · Michael Jordan · Thomas Schön -
2012 Oral: Privacy Aware Learning »
John Duchi · Michael Jordan · Martin J Wainwright -
2012 Poster: Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods »
John Duchi · Michael Jordan · Martin J Wainwright · Andre Wibisono -
2012 Poster: Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models »
Ke Jiang · Brian Kulis · Michael Jordan -
2011 Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale »
Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu -
2011 Poster: Bayesian Bias Mitigation for Crowdsourcing »
Fabian L Wauthier · Michael Jordan -
2011 Poster: Divide-and-Conquer Matrix Factorization »
Lester W Mackey · Ameet S Talwalkar · Michael Jordan -
2010 Oral: Tree-Structured Stick Breaking for Hierarchical Data »
Ryan Adams · Zoubin Ghahramani · Michael Jordan -
2010 Invited Talk (Posner Lecture): Statistical Inference of Protein Structure and Function »
Michael Jordan -
2010 Poster: Tree-Structured Stick Breaking for Hierarchical Data »
Ryan Adams · Zoubin Ghahramani · Michael Jordan -
2010 Spotlight: Variational Inference over Combinatorial Spaces »
Alexandre Bouchard-Côté · Michael Jordan -
2010 Poster: Variational Inference over Combinatorial Spaces »
Alexandre Bouchard-Côté · Michael Jordan -
2010 Poster: Unsupervised Kernel Dimension Reduction »
Meihong Wang · Fei Sha · Michael Jordan -
2010 Poster: Heavy-Tailed Process Priors for Selective Shrinkage »
Fabian L Wauthier · Michael Jordan -
2010 Poster: Random Conic Pursuit for Semidefinite Programming »
Ariel Kleiner · ali rahimi · Michael Jordan -
2009 Workshop: Nonparametric Bayes »
Dilan Gorur · Francois Caron · Yee Whye Teh · David B Dunson · Zoubin Ghahramani · Michael Jordan -
2009 Poster: Sharing Features among Dynamical Systems with Beta Processes »
Emily Fox · Erik Sudderth · Michael Jordan · Alan S Willsky -
2009 Oral: Sharing Features among Dynamical Systems with Beta Processes »
Emily Fox · Erik Sudderth · Michael Jordan · Alan S Willsky -
2009 Poster: Asymptotically Optimal Regularization in Smooth Parametric Models »
Percy Liang · Francis Bach · Guillaume Bouchard · Michael Jordan -
2009 Poster: Nonparametric Latent Feature Models for Link Prediction »
Kurt T Miller · Tom Griffiths · Michael Jordan -
2009 Spotlight: Nonparametric Latent Feature Models for Link Prediction »
Kurt T Miller · Tom Griffiths · Michael Jordan -
2008 Oral: Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes »
Erik Sudderth · Michael Jordan -
2008 Poster: Nonparametric Bayesian Learning of Switching Linear Dynamical Systems »
Emily Fox · Erik Sudderth · Michael Jordan · Alan S Willsky -
2008 Poster: High-dimensional union support recovery in multivariate regression »
Guillaume R Obozinski · Martin J Wainwright · Michael Jordan -
2008 Poster: Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes »
Erik Sudderth · Michael Jordan -
2008 Spotlight: High-dimensional union support recovery in multivariate regression »
Guillaume R Obozinski · Martin J Wainwright · Michael Jordan -
2008 Spotlight: Nonparametric Bayesian Learning of Switching Linear Dynamical Systems »
Emily Fox · Erik Sudderth · Michael Jordan · Alan S Willsky -
2008 Poster: Posterior Consistency of the Silverman g-prior in Bayesian Model Choice »
Zhihua Zhang · Michael Jordan · Dit-Yan Yeung -
2008 Poster: DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification »
Simon Lacoste-Julien · Fei Sha · Michael Jordan -
2008 Spotlight: Posterior Consistency of the Silverman g-prior in Bayesian Model Choice »
Zhihua Zhang · Michael Jordan · Dit-Yan Yeung -
2008 Poster: Efficient Inference in Phylogenetic InDel Trees »
Alexandre Bouchard-Côté · Michael Jordan · Dan Klein -
2008 Poster: Spectral Clustering with Perturbed Data »
Ling Huang · Donghui Yan · Michael Jordan · Nina Taft -
2008 Spotlight: Efficient Inference in Phylogenetic InDel Trees »
Alexandre Bouchard-Côté · Michael Jordan · Dan Klein -
2008 Spotlight: Spectral Clustering with Perturbed Data »
Ling Huang · Donghui Yan · Michael Jordan · Nina Taft -
2007 Poster: Agreement-Based Learning »
Percy Liang · Dan Klein · Michael Jordan -
2007 Spotlight: Agreement-Based Learning »
Percy Liang · Dan Klein · Michael Jordan -
2007 Spotlight: Resampling Methods for Protein Structure Prediction with Rosetta »
Ben Blum · David Baker · Michael Jordan · Philip Bradley · Rhiju Das · David Kim -
2007 Spotlight: Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization »
XuanLong Nguyen · Martin J Wainwright · Michael Jordan -
2007 Poster: Resampling Methods for Protein Structure Prediction with Rosetta »
Ben Blum · David Baker · Michael Jordan · Philip Bradley · Rhiju Das · David Kim -
2007 Poster: Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization »
XuanLong Nguyen · Martin J Wainwright · Michael Jordan -
2006 Poster: Distributed PCA and Network Anomaly Detection »
Ling Huang · XuanLong Nguyen · Minos Garofalakis · Michael Jordan · Anthony D Joseph · Nina Taft