Timezone: »
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration,bellemare2016unifying], we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.
Author Information
Fei Feng (University of California, Los Angeles)
Ruosong Wang (Carnegie Mellon University)
Wotao Yin (Alibaba US, DAMO Academy)
Simon Du (Institute for Advanced Study)
Lin Yang (UCLA)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Tue. Dec 8th 05:00 -- 07:00 AM Room Poster Session 0 #169
More from the Same Authors
-
2021 Spotlight: Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems »
Tianyi Chen · Yuejiao Sun · Wotao Yin -
2021 : Practice-Consistent Analysis of Adam-Style Methods »
Zhishuai Guo · Yi Xu · Wotao Yin · Rong Jin · Tianbao Yang -
2021 : Doubly Pessimistic Algorithms for Strictly Safe Off-Policy Optimization »
Sanae Amani · Lin Yang -
2022 : From Local to Global: Spectral-Inspired Graph Neural Networks »
Ningyuan Huang · Soledad Villar · Carey E Priebe · Da Zheng · Chengyue Huang · Lin Yang · Vladimir Braverman -
2023 Poster: Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds »
Jiayi Huang · Han Zhong · Liwei Wang · Lin Yang -
2023 Poster: Efficient Batched Algorithm for Contextual Linear Bandits with Large Action Space via Soft Elimination »
Osama Hanna · Lin Yang · Christina Fragouli -
2023 Poster: Replicability in Reinforcement Learning »
Amin Karbasi · Grigoris Velegkas · Lin Yang · Felix Zhou -
2022 Poster: Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning »
Dingwen Kong · Lin Yang -
2022 Poster: Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context »
Osama Hanna · Lin Yang · Christina Fragouli -
2022 Poster: Near-Optimal Sample Complexity Bounds for Constrained MDPs »
Sharan Vaswani · Lin Yang · Csaba Szepesvari -
2021 Poster: Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems »
Tianyi Chen · Yuejiao Sun · Wotao Yin -
2021 Poster: An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap »
Yuanhao Wang · Ruosong Wang · Sham Kakade -
2021 Poster: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs »
Han Zhong · Jiayi Huang · Lin Yang · Liwei Wang -
2021 Poster: Hyperparameter Tuning is All You Need for LISTA »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2021 Poster: On the Value of Interaction and Function Approximation in Imitation Learning »
Nived Rajaraman · Yanjun Han · Lin Yang · Jingbo Liu · Jiantao Jiao · Kannan Ramchandran -
2021 Poster: Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection »
HanQin Cai · Jialin Liu · Wotao Yin -
2021 Poster: Exponential Graph is Provably Efficient for Decentralized Deep Training »
Bicheng Ying · Kun Yuan · Yiming Chen · Hanbin Hu · PAN PAN · Wotao Yin -
2021 Poster: Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2021 Poster: An Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders »
Xinmeng Huang · Kun Yuan · Xianghui Mao · Wotao Yin -
2021 Oral: An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap »
Yuanhao Wang · Ruosong Wang · Sham Kakade -
2020 : Contributed Talk 6: What are the Statistical Limits for Batch RL with Linear Function Approximation? »
Ruosong Wang -
2020 Poster: Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality »
Yi Zhang · Orestis Plevrakis · Simon Du · Xingguo Li · Zhao Song · Sanjeev Arora -
2020 Poster: An Improved Analysis of Stochastic Gradient Descent with Momentum »
Yanli Liu · Yuan Gao · Wotao Yin -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: Toward the Fundamental Limits of Imitation Learning »
Nived Rajaraman · Lin Yang · Jiantao Jiao · Kannan Ramchandran -
2020 Poster: Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning? »
Qiwen Cui · Lin Yang -
2020 Poster: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski -
2020 Spotlight: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski -
2020 Poster: Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity »
Simon Du · Jason Lee · Gaurav Mahajan · Ruosong Wang -
2020 Poster: An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods »
Yanli Liu · Kaiqing Zhang · Tamer Basar · Wotao Yin -
2020 Poster: On Reward-Free Reinforcement Learning with Linear Function Approximation »
Ruosong Wang · Simon Du · Lin Yang · Russ Salakhutdinov -
2020 Poster: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension »
Ruosong Wang · Russ Salakhutdinov · Lin Yang -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Late-Breaking Papers (Talks) »
David Silver · Simon Du · Matthias Plappert -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Towards Understanding the Importance of Shortcut Connections in Residual Networks »
Tianyi Liu · Minshuo Chen · Mo Zhou · Simon Du · Enlu Zhou · Tuo Zhao -
2019 Poster: Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels »
Simon Du · Kangcheng Hou · Russ Salakhutdinov · Barnabas Poczos · Ruosong Wang · Keyulu Xu -
2019 Poster: Efficient Symmetric Norm Regression via Linear Sketching »
Zhao Song · Ruosong Wang · Lin Yang · Hongyang Zhang · Peilin Zhong -
2019 Poster: Acceleration via Symplectic Discretization of High-Resolution Differential Equations »
Bin Shi · Simon Du · Weijie Su · Michael Jordan -
2019 Poster: Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle »
Simon Du · Yuping Luo · Ruosong Wang · Hanrui Zhang -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang