Timezone: »
Poster
Exponential Family Model-Based Reinforcement Learning via Score Matching
Gene Li · Junbo Li · Anmol Kabra · Nati Srebro · Zhaoran Wang · Zhuoran Yang
We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression. Under standard regularity assumptions, SMRL achieves $\tilde O(d\sqrt{H^3T})$ online regret, where $H$ is the length of each episode and $T$ is the total number of interactions (ignoring polynomial dependence on structural scale parameters).
Author Information
Gene Li (Toyota Technological Institute at Chicago)
Junbo Li (University of California, Santa Cruz)
Anmol Kabra (TTIC)
Nati Srebro (TTI-Chicago)
Zhaoran Wang (Northwestern University)
Zhuoran Yang (Yale University)
More from the Same Authors
-
2021 Spotlight: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2021 : Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Nathan Srebro · Zhaoran Wang · Zhuoran Yang -
2022 Poster: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 : Distributed Online and Bandit Convex Optimization »
Kumar Kshitij Patel · Aadirupa Saha · Nati Srebro · Lingxiao Wang -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2022 : Mitigating Lies in Vision-Language Models »
Junbo Li · Xianhang Li · Cihang Xie -
2023 Poster: The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks »
Spencer Frei · Gal Vardi · Peter Bartlett · Nati Srebro -
2023 Poster: Most Neural Networks Are Almost Learnable »
Amit Daniely · Nati Srebro · Gal Vardi -
2023 Poster: Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning »
Haoran He · Chenjia Bai · Kang Xu · Zhuoran Yang · Weinan Zhang · Dong Wang · Bin Zhao · Xuelong Li -
2023 Poster: Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms »
Shenao Zhang · Boyi Liu · Zhaoran Wang · Tuo Zhao -
2023 Poster: Learning Regularized Monotone Graphon Mean-Field Games »
Fengzhuo Zhang · Vincent Tan · Zhaoran Wang · Zhuoran Yang -
2023 Poster: Uniform Convergence with Square-Root Lipschitz Loss »
Lijia Zhou · Zhen Dai · Frederic Koehler · Nati Srebro -
2023 Poster: Posterior Sampling for Competitive RL: Function Approximation and Partial Observation »
Shuang Qiu · Ziyu Dai · Han Zhong · Zhaoran Wang · Zhuoran Yang · Tong Zhang -
2023 Poster: Online Performative Gradient Descent for Learning Nash Equilibria in Decision-Dependent Games »
Zihan Zhu · Ethan Fang · Zhuoran Yang -
2023 Poster: One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration »
Zhihan Liu · Miao Lu · WEI XIONG · Han Zhong · Hao Hu · Shenao Zhang · Sirui Zheng · Zhuoran Yang · Zhaoran Wang -
2023 Poster: When is Agnostic Reinforcement Learning Statistically Tractable? »
Zeyu Jia · Gene Li · Alexander Rakhlin · Ayush Sekhari · Nati Srebro -
2023 Poster: Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy »
Amit Daniely · Nati Srebro · Gal Vardi -
2022 Panel: Panel 6B-3: Exponential Family Model-Based… & Deep Generalized Schrödinger… »
Guan-Horng Liu · Gene Li -
2022 Spotlight: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Poster: Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence »
Boyi Liu · Jiayang Li · Zhuoran Yang · Hoi-To Wai · Mingyi Hong · Yu Nie · Zhaoran Wang -
2022 Poster: A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models »
Lijia Zhou · Frederic Koehler · Pragya Sur · Danica J. Sutherland · Nati Srebro -
2022 Poster: On Margin Maximization in Linear and ReLU Networks »
Gal Vardi · Ohad Shamir · Nati Srebro -
2022 Poster: A Unifying Framework of Off-Policy General Value Function Evaluation »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin Liang -
2022 Poster: Towards Optimal Communication Complexity in Distributed Non-Convex Optimization »
Kumar Kshitij Patel · Lingxiao Wang · Blake Woodworth · Brian Bullins · Nati Srebro -
2022 Poster: Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization »
Idan Amir · Roi Livni · Nati Srebro -
2022 Poster: Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL »
Fengzhuo Zhang · Boyi Liu · Kaixin Wang · Vincent Tan · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets »
Yifei Min · Tianhao Wang · Ruitu Xu · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: The Sample Complexity of One-Hidden-Layer Neural Networks »
Gal Vardi · Ohad Shamir · Nati Srebro -
2022 Poster: Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets »
Gene Li · Cong Ma · Nati Srebro -
2022 Poster: Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization »
Omar Montasser · Steve Hanneke · Nati Srebro -
2022 Poster: Reinforcement Learning with Logarithmic Regret and Policy Switches »
Grigoris Velegkas · Zhuoran Yang · Amin Karbasi -
2022 Poster: Understanding the Eluder Dimension »
Gene Li · Pritish Kamath · Dylan J Foster · Nati Srebro -
2022 Poster: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning »
Xiao-Yang Liu · Ziyi Xia · Jingyang Rui · Jiechao Gao · Hongyang Yang · Ming Zhu · Christina Wang · Zhaoran Wang · Jian Guo -
2021 Poster: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2021 Oral: Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting »
Frederic Koehler · Lijia Zhou · Danica J. Sutherland · Nathan Srebro -
2021 Poster: Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting »
Frederic Koehler · Lijia Zhou · Danica J. Sutherland · Nathan Srebro -
2021 Poster: Representation Costs of Linear Neural Networks: Analysis and Design »
Zhen Dai · Mina Karzand · Nathan Srebro -
2021 Poster: An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning »
Blake Woodworth · Nathan Srebro -
2021 Poster: A Stochastic Newton Algorithm for Distributed Convex Optimization »
Brian Bullins · Kshitij Patel · Ohad Shamir · Nathan Srebro · Blake Woodworth -
2020 Poster: Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework »
Wanxin Jin · Zhaoran Wang · Zhuoran Yang · Shaoshuai Mou -
2020 Poster: On Uniform Convergence and Low-Norm Interpolation Learning »
Lijia Zhou · Danica J. Sutherland · Nati Srebro -
2020 Poster: Reducing Adversarially Robust Learning to Non-Robust PAC Learning »
Omar Montasser · Steve Hanneke · Nati Srebro -
2020 Spotlight: On Uniform Convergence and Low-Norm Interpolation Learning »
Lijia Zhou · Danica J. Sutherland · Nati Srebro -
2020 Poster: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Poster: Minibatch vs Local SGD for Heterogeneous Distributed Learning »
Blake Woodworth · Kumar Kshitij Patel · Nati Srebro -
2020 Spotlight: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Poster: Provably Efficient Neural GTD for Off-Policy Learning »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2020 Poster: End-to-End Learning and Intervention in Games »
Jiayang Li · Jing Yu · Yu Nie · Zhaoran Wang -
2020 Poster: Dynamic Regret of Policy Optimization in Non-Stationary Environments »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang · Qiaomin Xie -
2020 Poster: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2020 Poster: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss »
Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jieping Ye · Zhaoran Wang -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 Poster: Statistical-Computational Tradeoff in Single Index Models »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost »
Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang -
2019 Poster: Variance Reduced Policy Evaluation with Smooth Function Approximation »
Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang -
2019 Poster: Convergent Policy Optimization for Safe Reinforcement Learning »
Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang -
2018 Poster: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Poster: Contrastive Learning from Pairwise Measurements »
Yi Chen · Zhuoran Yang · Yuchen Xie · Zhaoran Wang -
2018 Spotlight: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Poster: Provable Gaussian Embedding with One Observation »
Ming Yu · Zhuoran Yang · Tuo Zhao · Mladen Kolar · Zhaoran Wang -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: The Everlasting Database: Statistical Validity at a Fair Price »
Blake Woodworth · Vitaly Feldman · Saharon Rosset · Nati Srebro -
2018 Poster: Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2018 Poster: On preserving non-discrimination when combining expert advice »
Avrim Blum · Suriya Gunasekar · Thodoris Lykouris · Nati Srebro -
2017 Poster: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Ashia C Wilson · Becca Roelofs · Mitchell Stern · Nati Srebro · Benjamin Recht -
2017 Oral: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Ashia C Wilson · Becca Roelofs · Mitchell Stern · Nati Srebro · Benjamin Recht -
2017 Poster: Stochastic Approximation for Canonical Correlation Analysis »
Raman Arora · Teodor Vanislavov Marinov · Poorya Mianjy · Nati Srebro -
2017 Poster: Exploring Generalization in Deep Learning »
Behnam Neyshabur · Srinadh Bhojanapalli · David Mcallester · Nati Srebro -
2017 Poster: Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma »
Zhuoran Yang · Krishnakumar Balasubramanian · Zhaoran Wang · Han Liu -
2017 Poster: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Spotlight: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2016 Poster: Tight Complexity Bounds for Optimizing Composite Objectives »
Blake Woodworth · Nati Srebro -
2016 Poster: Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis »
Weiran Wang · Jialei Wang · Dan Garber · Dan Garber · Nati Srebro -
2016 Poster: Global Optimality of Local Search for Low Rank Matrix Recovery »
Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2016 Poster: Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations »
Behnam Neyshabur · Yuhuai Wu · Russ Salakhutdinov · Nati Srebro -
2016 Poster: Equality of Opportunity in Supervised Learning »
Moritz Hardt · Eric Price · Eric Price · Nati Srebro -
2016 Poster: Normalized Spectral Map Synchronization »
Yanyao Shen · Qixing Huang · Nati Srebro · Sujay Sanghavi -
2015 Poster: Path-SGD: Path-Normalized Optimization in Deep Neural Networks »
Behnam Neyshabur · Russ Salakhutdinov · Nati Srebro -
2014 Poster: Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm »
Deanna Needell · Rachel Ward · Nati Srebro -
2013 Workshop: Learning Faster From Easy Data »
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2013 Poster: Stochastic Optimization of PCA with Capped MSG »
Raman Arora · Andrew Cotter · Nati Srebro -
2013 Poster: Auditing: Active Learning with Outcome-Dependent Query Costs »
Sivan Sabato · Anand D Sarwate · Nati Srebro -
2013 Poster: The Power of Asymmetry in Binary Hashing »
Behnam Neyshabur · Nati Srebro · Russ Salakhutdinov · Yury Makarychev · Payman Yadollahpour -
2012 Poster: Sparse Prediction with the $k$-Support Norm »
Andreas Argyriou · Rina Foygel · Nati Srebro -
2012 Spotlight: Sparse Prediction with the $k$-Support Norm »
Andreas Argyriou · Rina Foygel · Nati Srebro -
2012 Poster: Matrix reconstruction with the local max norm »
Rina Foygel · Nati Srebro · Russ Salakhutdinov -
2011 Poster: Beating SGD: Learning SVMs in Sublinear Time »
Elad Hazan · Tomer Koren · Nati Srebro -
2011 Poster: Better Mini-Batch Algorithms via Accelerated Gradient Methods »
Andrew Cotter · Ohad Shamir · Nati Srebro · Karthik Sridharan -
2011 Poster: On the Universality of Online Mirror Descent »
Nati Srebro · Karthik Sridharan · Ambuj Tewari -
2011 Poster: Learning with the weighted trace-norm under arbitrary sampling distributions »
Rina Foygel · Russ Salakhutdinov · Ohad Shamir · Nati Srebro -
2010 Session: Spotlights Session 11 »
Nati Srebro -
2010 Session: Oral Session 13 »
Nati Srebro -
2010 Poster: Tight Sample Complexity of Large-Margin Learning »
Sivan Sabato · Nati Srebro · Naftali Tishby -
2010 Poster: Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm »
Russ Salakhutdinov · Nati Srebro -
2010 Poster: Practical Large-Scale Optimization for Max-norm Regularization »
Jason D Lee · Benjamin Recht · Russ Salakhutdinov · Nati Srebro · Joel A Tropp -
2010 Poster: Smoothness, Low Noise and Fast Rates »
Nati Srebro · Karthik Sridharan · Ambuj Tewari -
2009 Workshop: Understanding Multiple Kernel Learning Methods »
Brian McFee · Gert Lanckriet · Francis Bach · Nati Srebro -
2009 Poster: Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data »
Boaz Nadler · Nati Srebro · Xueyuan Zhou -
2009 Spotlight: Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data »
Boaz Nadler · Nati Srebro · Xueyuan Zhou -
2008 Poster: Fast Rates for Regularized Objectives »
Karthik Sridharan · Shai Shalev-Shwartz · Nati Srebro