Timezone: »
Exponential Family Model-Based Reinforcement Learning via Score Matching
Gene Li · Junbo Li · Nathan Srebro · Zhaoran Wang · Zhuoran Yang
Event URL: https://openreview.net/forum?id=9GqTPzU1va »
We propose a optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression. SMRL achieves $\tilde O(d\sqrt{H^3T})$ regret, where $H$ is the length of each episode and $T$ is the total number of interactions.
We propose a optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression. SMRL achieves $\tilde O(d\sqrt{H^3T})$ regret, where $H$ is the length of each episode and $T$ is the total number of interactions.
Author Information
Gene Li (Toyota Technological Institute at Chicago)
Junbo Li (UC Santa Cruz)
Nathan Srebro (University of Toronto)
Zhaoran Wang (Princeton University)
Zhuoran Yang (Princeton)
More from the Same Authors
-
2021 Spotlight: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2021 : GPU-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning »
Xiao-Yang Liu · Zhuoran Yang · Zhaoran Wang · Anwar Walid · Jian Guo · Michael Jordan -
2022 Panel: Panel 6B-3: Exponential Family Model-Based… & Deep Generalized Schrödinger… »
Guan-Horng Liu · Gene Li -
2022 Poster: Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets »
Gene Li · Cong Ma · Nati Srebro -
2022 Poster: Understanding the Eluder Dimension »
Gene Li · Pritish Kamath · Dylan J Foster · Nati Srebro -
2022 Poster: Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Anmol Kabra · Nati Srebro · Zhaoran Wang · Zhuoran Yang -
2021 Poster: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2021 Oral: Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting »
Frederic Koehler · Lijia Zhou · Danica J. Sutherland · Nathan Srebro -
2021 Poster: Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL »
Minshuo Chen · Yan Li · Ethan Wang · Zhuoran Yang · Zhaoran Wang · Tuo Zhao -
2021 Poster: Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting »
Frederic Koehler · Lijia Zhou · Danica J. Sutherland · Nathan Srebro -
2021 Poster: Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang -
2021 Poster: Representation Costs of Linear Neural Networks: Analysis and Design »
Zhen Dai · Mina Karzand · Nathan Srebro -
2021 Poster: A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum »
Prashant Khanduri · Siliang Zeng · Mingyi Hong · Hoi-To Wai · Zhaoran Wang · Zhuoran Yang -
2021 Poster: BooVI: Provably Efficient Bootstrapped Value Iteration »
Boyi Liu · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic »
Yufeng Zhang · Siyu Chen · Zhuoran Yang · Michael Jordan · Zhaoran Wang -
2021 Poster: Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration »
Runzhe Wu · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2021 Poster: An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning »
Blake Woodworth · Nathan Srebro -
2021 Poster: A Stochastic Newton Algorithm for Distributed Convex Optimization »
Brian Bullins · Kshitij Patel · Ohad Shamir · Nathan Srebro · Blake Woodworth -
2021 Poster: Dynamic Bottleneck for Robust Self-Supervised Exploration »
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang -
2021 Poster: Provably Efficient Causal Reinforcement Learning with Confounded Observational Data »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework »
Wanxin Jin · Zhaoran Wang · Zhuoran Yang · Shaoshuai Mou -
2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Poster: Provably Efficient Neural GTD for Off-Policy Learning »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2020 Poster: Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach »
Luofeng Liao · You-Lin Chen · Zhuoran Yang · Bo Dai · Mladen Kolar · Zhaoran Wang -
2020 Poster: Dynamic Regret of Policy Optimization in Non-Stationary Environments »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang · Qiaomin Xie -
2020 Poster: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2020 Poster: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss »
Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jieping Ye · Zhaoran Wang -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2016 Poster: NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization »
Davood Hajinezhad · Mingyi Hong · Tuo Zhao · Zhaoran Wang -
2016 Poster: Agnostic Estimation for Misspecified Phase Retrieval Models »
Matey Neykov · Zhaoran Wang · Han Liu -
2016 Poster: Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes »
Chris Junchi Li · Zhaoran Wang · Han Liu -
2016 Poster: Blind Attacks on Machine Learners »
Alex Beatson · Zhaoran Wang · Han Liu -
2016 Poster: More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning »
Xinyang Yi · Zhaoran Wang · Zhuoran Yang · Constantine Caramanis · Han Liu -
2015 Poster: Optimal Linear Estimation under Unknown Nonlinear Transform »
Xinyang Yi · Zhaoran Wang · Constantine Caramanis · Han Liu -
2015 Poster: Non-convex Statistical Optimization for Sparse Tensor Graphical Model »
Wei Sun · Zhaoran Wang · Han Liu · Guang Cheng -
2015 Poster: High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality »
Zhaoran Wang · Quanquan Gu · Yang Ning · Han Liu -
2015 Poster: A Nonconvex Optimization Framework for Low Rank Matrix Estimation »
Tuo Zhao · Zhaoran Wang · Han Liu -
2014 Poster: Sparse PCA with Oracle Property »
Quanquan Gu · Zhaoran Wang · Han Liu -
2014 Poster: Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time »
Zhaoran Wang · Huanran Lu · Han Liu