Timezone: »
Poster
Nearly Horizon-Free Offline Reinforcement Learning
Tongzheng Ren · Jialian Li · Bo Dai · Simon Du · Sujay Sanghavi
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes (MDP). For tabular MDP with $S$ states and $A$ actions, or linear MDP with anchor points and feature dimension $d$, given the collected $K$ episodes data with minimum visiting probability of (anchor) state-action pairs $d_m$, we obtain nearly horizon $H$-free sample complexity bounds for offline reinforcement learning when the total reward is upper bounded by 1. Specifically:• For offline policy evaluation, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Kd_m}} \right)$ error bound for the plug-in estimator, which matches the lower bound up to logarithmic factors and does not have additional dependency on $\mathrm{poly}(H, S, A, d)$ in higher-order term.• For offline policy optimization, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Kd_m}} + \frac{\min(S, d)}{Kd_m}\right)$ sub-optimality gap for the empirical optimal policy, which approaches the lower bound up to logarithmic factors and a high-order term, improving upon the best known result by [Cui and Yang 2020] that has additional $\mathrm{poly} (H, S, d)$ factors in the main term.To the best of our knowledge, these are the first set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points. Central to our analysis is a simple yet effective recursion based method to bound a "total variance" term in the offline scenarios, which could be of individual interest.
Author Information
Tongzheng Ren (UT Austin)
Jialian Li (Tsinghua University)
Bo Dai (Google Brain)
Simon Du (University of Washington)
Sujay Sanghavi (UT-Austin)
More from the Same Authors
-
2021 Spotlight: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 Spotlight: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 : Offline Policy Selection under Uncertainty »
Mengjiao (Sherry) Yang · Bo Dai · Ofir Nachum · George Tucker · Dale Schuurmans -
2022 : Differentially Private Federated Learning with Normalized Updates »
Rudrajit Das · Abolfazl Hashemi · Sujay Sanghavi · Inderjit Dhillon -
2023 Poster: Markovian Sliced Wasserstein Distances: Beyond Independent Projections »
Khai Nguyen · Tongzheng Ren · Nhat Ho -
2023 Poster: Logarithmic Bayes Regret Bounds »
Alexia Atsidakou · Branislav Kveton · Sumeet Katariya · Constantine Caramanis · Sujay Sanghavi -
2023 Poster: Designing Robust Transformers using Robust Kernel Density Estimation »
Xing Han · Tongzheng Ren · Tan Nguyen · Khai Nguyen · Joydeep Ghosh · Nhat Ho -
2022 Poster: Minimax Regret for Cascading Bandits »
Daniel Vial · Sujay Sanghavi · Sanjay Shakkottai · R. Srikant -
2022 Poster: Toward Understanding Privileged Features Distillation in Learning-to-Rank »
Shuo Yang · Sujay Sanghavi · Holakou Rahmanian · Jan Bakus · Vishwanathan S. V. N. -
2021 Poster: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 Poster: Towards understanding retrosynthesis by energy-based models »
Ruoxi Sun · Hanjun Dai · Li Li · Steven Kearnes · Bo Dai -
2021 Poster: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Poster: Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP »
Zihan Zhang · Jiaqi Yang · Xiangyang Ji · Simon Du -
2021 Poster: Scalable Quasi-Bayesian Inference for Instrumental Variable Regression »
Ziyu Wang · Yuhao Zhou · Tongzheng Ren · Jun Zhu -
2021 Poster: Corruption Robust Active Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Poster: Understanding the Effect of Stochasticity in Policy Optimization »
Jincheng Mei · Bo Dai · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization »
Tian Ye · Simon Du -
2020 Poster: Implicit Regularization and Convergence for Weight Normalization »
Xiaoxia Wu · Edgar Dobriban · Tongzheng Ren · Shanshan Wu · Zhiyuan Li · Suriya Gunasekar · Rachel Ward · Qiang Liu -
2020 Poster: Stein Self-Repulsive Dynamics: Benefits From Past Samples »
Mao Ye · Tongzheng Ren · Qiang Liu -
2019 Poster: Interaction Hard Thresholding: Consistent Sparse Quadratic Regression in Sub-quadratic Time and Space »
Shuo Yang · Yanyao Shen · Sujay Sanghavi -
2019 Poster: Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models »
Shanshan Wu · Sujay Sanghavi · Alex Dimakis -
2019 Spotlight: Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models »
Shanshan Wu · Sujay Sanghavi · Alex Dimakis -
2019 Poster: Iterative Least Trimmed Squares for Mixed Linear Regression »
Yanyao Shen · Sujay Sanghavi -
2019 Poster: Blocking Bandits »
Soumya Basu · Rajat Sen · Sujay Sanghavi · Sanjay Shakkottai -
2019 Poster: Learning Distributions Generated by One-Layer ReLU Networks »
Shanshan Wu · Alex Dimakis · Sujay Sanghavi -
2018 : Poster Session »
Sujay Sanghavi · Vatsal Shah · Yanyao Shen · Tianchen Zhao · Yuandong Tian · Tomer Galanti · Mufan Li · Gilad Cohen · Daniel Rothchild · Aristide Baratin · Devansh Arpit · Vagelis Papalexakis · Michael Perlmutter · Ashok Vardhan Makkuva · Pim de Haan · Yingyan Lin · Wanmo Kang · Cheolhyoung Lee · Hao Shen · Sho Yaida · Dan Roberts · Nadav Cohen · Philippe Casgrain · Dejiao Zhang · Tengyu Ma · Avinash Ravichandran · Julian Emilio Salazar · Bo Li · Davis Liang · Christopher Wong · Glen Bigan Mbeng · Animesh Garg -
2018 Poster: How Many Samples are Needed to Estimate a Convolutional Neural Network? »
Simon Du · Yining Wang · Xiyu Zhai · Sivaraman Balakrishnan · Russ Salakhutdinov · Aarti Singh -
2018 Poster: Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification »
Harsh Shrivastava · Eugene Bart · Bob Price · Hanjun Dai · Bo Dai · Srinivas Aluru -
2018 Poster: Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced »
Simon Du · Wei Hu · Jason Lee -
2018 Poster: Coupled Variational Bayes via Optimization Embedding »
Bo Dai · Hanjun Dai · Niao He · Weiyang Liu · Zhen Liu · Jianshu Chen · Lin Xiao · Le Song -
2018 Poster: Predictive Approximate Bayesian Computation via Saddle Points »
Yingxiang Yang · Bo Dai · Negar Kiyavash · Niao He -
2018 Poster: Learning towards Minimum Hyperspherical Energy »
Weiyang Liu · Rongmei Lin · Zhen Liu · Lixin Liu · Zhiding Yu · Bo Dai · Le Song -
2017 Poster: Hypothesis Transfer Learning via Transformation Functions »
Simon Du · Jayanth Koushik · Aarti Singh · Barnabas Poczos -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: On the Power of Truncated SVD for General High-rank Matrix Estimation Problems »
Simon Du · Yining Wang · Aarti Singh -
2016 Poster: Single Pass PCA of Matrix Products »
Shanshan Wu · Srinadh Bhojanapalli · Sujay Sanghavi · Alex Dimakis -
2016 Poster: Conditional Generative Moment-Matching Networks »
Yong Ren · Jun Zhu · Jialian Li · Yucen Luo -
2016 Poster: Normalized Spectral Map Synchronization »
Yanyao Shen · Qixing Huang · Nati Srebro · Sujay Sanghavi -
2016 Poster: Efficient Nonparametric Smoothness Estimation »
Shashank Singh · Simon Du · Barnabas Poczos -
2015 Poster: Convergence Rates of Active Learning for Maximum Likelihood Estimation »
Kamalika Chaudhuri · Sham Kakade · Praneeth Netrapalli · Sujay Sanghavi -
2014 Poster: Non-convex Robust PCA »
Praneeth Netrapalli · Niranjan Uma Naresh · Sujay Sanghavi · Animashree Anandkumar · Prateek Jain -
2014 Spotlight: Non-convex Robust PCA »
Praneeth Netrapalli · Niranjan Uma Naresh · Sujay Sanghavi · Animashree Anandkumar · Prateek Jain -
2014 Poster: Greedy Subspace Clustering »
Dohyung Park · Constantine Caramanis · Sujay Sanghavi -
2014 Poster: Scalable Kernel Methods via Doubly Stochastic Gradients »
Bo Dai · Bo Xie · Niao He · Yingyu Liang · Anant Raj · Maria-Florina F Balcan · Le Song -
2013 Poster: Robust Low Rank Kernel Embeddings of Multivariate Distributions »
Le Song · Bo Dai -
2013 Poster: Phase Retrieval using Alternating Minimization »
Praneeth Netrapalli · Prateek Jain · Sujay Sanghavi -
2012 Poster: Clustering Sparse Graphs »
Yudong Chen · Sujay Sanghavi · Huan Xu -
2010 Workshop: Robust Statistical Learning »
Pradeep Ravikumar · Constantine Caramanis · Sujay Sanghavi -
2010 Oral: A Dirty Model for Multi-task Learning »
Ali Jalali · Pradeep Ravikumar · Sujay Sanghavi · Chao Ruan -
2010 Poster: Robust PCA via Outlier Pursuit »
Huan Xu · Constantine Caramanis · Sujay Sanghavi -
2010 Poster: A Dirty Model for Multi-task Learning »
Ali Jalali · Pradeep Ravikumar · Sujay Sanghavi · Chao Ruan -
2007 Spotlight: Message Passing for Max-weight Independent Set »
Sujay Sanghavi · Devavrat Shah · Alan S Willsky -
2007 Poster: Message Passing for Max-weight Independent Set »
Sujay Sanghavi · Devavrat Shah · Alan S Willsky -
2007 Poster: Linear programming analysis of loopy belief propagation for weighted matching »
Sujay Sanghavi · Dmitry Malioutov · Alan S Willsky