Timezone: »
Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular, when TD is performed in a universal reproducing kernel Hilbert space (RKHS), we prove convergence of the averaged iterates to the optimal value function, even when it does not belong to the RKHS. We provide explicit convergence rates that depend on a source condition relating the regularity of the optimal value function to the RKHS. We illustrate this convergence numerically on a simple continuous-state Markov reward process.
Author Information
Eloïse Berthier (Inria / ENS Paris)
Ziad Kobeissi (INRIA)
Francis Bach (INRIA - Ecole Normale Superieure)
Francis Bach is a researcher at INRIA, leading since 2011 the SIERRA project-team, which is part of the Computer Science Department at Ecole Normale Supérieure in Paris, France. After completing his Ph.D. in Computer Science at U.C. Berkeley, he spent two years at Ecole des Mines, and joined INRIA and Ecole Normale Supérieure in 2007. He is interested in statistical machine learning, and especially in convex optimization, combinatorial optimization, sparse methods, kernel-based learning, vision and signal processing. He gave numerous courses on optimization in the last few years in summer schools. He has been program co-chair for the International Conference on Machine Learning in 2015.
More from the Same Authors
-
2022 Spotlight: Lightning Talks 1A-4 »
Siwei Wang · Jing Liu · Nianqiao Ju · Shiqian Li · Eloïse Berthier · Muhammad Faaiz Taufiq · Arsene Fansi Tchango · Chen Liang · Chulin Xie · Jordan Awan · Jean-Francois Ton · Ziad Kobeissi · Wenguan Wang · Xinwang Liu · Kewen Wu · Rishab Goel · Jiaxu Miao · Suyuan Liu · Julien Martel · Ruobin Gong · Francis Bach · Chi Zhang · Rob Cornish · Sanmi Koyejo · Zhi Wen · Yee Whye Teh · Yi Yang · Jiaqi Jin · Bo Li · Yixin Zhu · Vinayak Rao · Wenxuan Tu · Gaetan Marceau Caron · Arnaud Doucet · Xinzhong Zhu · Joumana Ghosn · En Zhu -
2022 Spotlight: A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning »
Eloïse Berthier · Ziad Kobeissi · Francis Bach -
2022 Poster: Variational inference via Wasserstein gradient flows »
Marc Lambert · Sinho Chewi · Francis Bach · Silvère Bonnabel · Philippe Rigollet -
2022 Poster: Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays »
Konstantin Mishchenko · Francis Bach · Mathieu Even · Blake Woodworth -
2022 Poster: On the Theoretical Properties of Noise Correlation in Stochastic Optimization »
Aurelien Lucchi · Frank Proske · Antonio Orvieto · Francis Bach · Hans Kersting -
2022 Poster: Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization »
Benjamin Dubois-Taine · Francis Bach · Quentin Berthet · Adrien Taylor -
2022 Poster: Active Labeling: Streaming Stochastic Gradients »
Vivien Cabannes · Francis Bach · Vianney Perchet · Alessandro Rudi -
2020 : Francis Bach - Where is Machine Learning Going? »
Francis Bach -
2017 : Concluding remarks »
Francis Bach · Benjamin Guedj · Pascal Germain -
2017 : Neil Lawrence, Francis Bach and François Laviolette »
Neil Lawrence · Francis Bach · Francois Laviolette -
2017 : Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance »
Francis Bach -
2017 : Overture »
Benjamin Guedj · Francis Bach · Pascal Germain -
2017 Workshop: (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights »
Benjamin Guedj · Pascal Germain · Francis Bach -
2017 Poster: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Oral: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Poster: Nonlinear Acceleration of Stochastic Algorithms »
Damien Scieur · Francis Bach · Alexandre d'Aspremont -
2017 Poster: Integration Methods and Optimization Algorithms »
Damien Scieur · Vincent Roulet · Francis Bach · Alexandre d'Aspremont -
2016 : Francis Bach. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression. »
Francis Bach -
2016 : Submodular Functions: from Discrete to Continuous Domains »
Francis Bach -
2016 Tutorial: Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity »
Suvrit Sra · Francis Bach