Timezone: »
Poster
Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
Zaiwei Chen · Siva Theja Maguluri · Sanjay Shakkottai · Karthikeyan Shanmugam
In TD-learning, off-policy sampling is known to be more practical than on-policy sampling, and by decoupling learning from data collection, it enables data reuse. It is known that policy evaluation has the interpretation of solving a generalized Bellman equation. In this paper, we derive finite-sample bounds for any general off-policy TD-like stochastic approximation algorithm that solves for the fixed-point of this generalized Bellman operator. Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted $\ell_p$-norm for each $p$ in $[1,\infty)$, with a common contraction factor. Off-policy TD-learning is known to suffer from high variance due to the product of importance sampling ratios. A number of algorithms (e.g. $Q^\pi(\lambda)$, Tree-Backup$(\lambda)$, Retrace$(\lambda)$, and $Q$-trace) have been proposed in the literature to address this issue. Our results immediately imply finite-sample bounds of these algorithms. In particular, we provide first-known finite-sample guarantees for $Q^\pi(\lambda)$, Tree-Backup$(\lambda)$, and Retrace$(\lambda)$, and improve the best known bounds of $Q$-trace in \citep{chen2021finite}. Moreover, we show the bias-variance trade-offs in each of these algorithms.
Author Information
Zaiwei Chen (Georgia Institute of Technology)
Siva Theja Maguluri (Georgia Institute of Technology)
Sanjay Shakkottai (University of Texas at Austin)
Karthikeyan Shanmugam (IBM Research, NY)
More from the Same Authors
-
2022 : Learning Certifiably Robust Controllers Using Fragile Perception »
Dawei Sun · Negin Musavi · Geir Dullerud · Sanjay Shakkottai · Sayan Mitra -
2022 : Learning Certifiably Robust Controllers Using Fragile Perception »
Dawei Sun · Negin Musavi · Geir Dullerud · Sanjay Shakkottai · Sayan Mitra -
2023 Poster: Solving Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models »
Litu Rout · Negin Raoof · Giannis Daras · Constantine Caramanis · Alex Dimakis · Sanjay Shakkottai -
2022 Poster: Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret »
Orestis Papadigenopoulos · Constantine Caramanis · Sanjay Shakkottai -
2022 Poster: Minimax Regret for Cascading Bandits »
Daniel Vial · Sujay Sanghavi · Sanjay Shakkottai · R. Srikant -
2022 Poster: FedAvg with Fine Tuning: Local Updates Lead to Representation Learning »
Liam Collins · Hamed Hassani · Aryan Mokhtari · Sanjay Shakkottai -
2021 Poster: CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions »
Isha Puri · Amit Dhurandhar · Tejaswini Pedapati · Karthikeyan Shanmugam · Dennis Wei · Kush Varshney -
2021 Poster: Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning »
Sheng Zhang · Zhe Zhang · Siva Theja Maguluri -
2021 Poster: Scalable Intervention Target Estimation in Linear Models »
Burak Varici · Karthikeyan Shanmugam · Prasanna Sattigeri · Ali Tajer -
2020 Poster: Task-Robust Model-Agnostic Meta-Learning »
Liam Collins · Aryan Mokhtari · Sanjay Shakkottai -
2020 Poster: Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions »
Matthew Faw · Rajat Sen · Karthikeyan Shanmugam · Constantine Caramanis · Sanjay Shakkottai -
2020 Poster: Applications of Common Entropy for Causal Inference »
Murat Kocaoglu · Sanjay Shakkottai · Alex Dimakis · Constantine Caramanis · Sriram Vishwanath -
2020 Poster: Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes »
Zaiwei Chen · Siva Theja Maguluri · Sanjay Shakkottai · Karthikeyan Shanmugam -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Blocking Bandits »
Soumya Basu · Rajat Sen · Sujay Sanghavi · Sanjay Shakkottai -
2017 Poster: Experimental Design for Learning Causal Graphs with Latent Variables »
Murat Kocaoglu · Karthikeyan Shanmugam · Elias Bareinboim -
2017 Poster: Model-Powered Conditional Independence Test »
Rajat Sen · Ananda Theertha Suresh · Karthikeyan Shanmugam · Alex Dimakis · Sanjay Shakkottai -
2016 Poster: Regret of Queueing Bandits »
Subhashini Krishnasamy · Rajat Sen · Ramesh Johari · Sanjay Shakkottai