Timezone: »
Estimating the per-state expected cumulative rewards is a critical aspect of reinforcement learning approaches, however the experience is obtained, but standard deep neural-network function-approximation methods are often inefficient in this setting. An alternative approach, exemplified by value iteration networks, is to learn transition and reward models of a latent Markov decision process whose value predictions fit the data. This approach has been shown empirically to converge faster to a more robust solution in many cases, but there has been little theoretical study of this phenomenon. In this paper, we explore such implicit representations of value functions via theory and focused experimentation. We prove that, for a linear parametrization, gradient descent converges to global optima despite non-linearity and non-convexity introduced by the implicit representation. Furthermore, we derive convergence rates for both cases which allow us to identify conditions under which stochastic gradient descent (SGD) with this implicit representation converges substantially faster than its explicit counterpart. Finally, we provide empirical results in some simple domains that illustrate the theoretical findings.
Author Information
Clement Gehring (Massachusetts Institute of Technology)
Kenji Kawaguchi (MIT)
Jiaoyang Huang (New York University)
Leslie Kaelbling (MIT)
More from the Same Authors
-
2020 : Robotic gripper design with Evolutionary Strategies and Graph Element Networks »
Ferran Alet · Maria Bauza · Adarsh K Jeewajee · Max Thomsen · Alberto Rodriguez · Leslie Kaelbling · Tomás Lozano-Pérez -
2021 : Catastrophic Failures of Neural Active Learning on Heteroskedastic Distributions »
Savya Khosla · Alex Lamb · Jordan Ash · Cyril Zhang · Kenji Kawaguchi -
2021 : Noether Networks: Meta-Learning Useful Conserved Quantities »
Ferran Alet · Dylan Doblar · Allan Zhou · Josh Tenenbaum · Kenji Kawaguchi · Chelsea Finn -
2022 : Solving PDDL Planning Problems with Pretrained Large Language Models »
Tom Silver · Varun Hariprasad · Reece Shuttleworth · Nishanth Kumar · Tomás Lozano-Pérez · Leslie Kaelbling -
2022 Poster: PDSketch: Integrated Domain Programming, Learning, and Planning »
Jiayuan Mao · Tomás Lozano-Pérez · Josh Tenenbaum · Leslie Kaelbling -
2021 Poster: Adversarial Training Helps Transfer Learning via Better Representations »
Zhun Deng · Linjun Zhang · Kailas Vodrahalli · Kenji Kawaguchi · James Zou -
2021 Poster: EIGNN: Efficient Infinite-Depth Graph Neural Networks »
Juncheng Liu · Kenji Kawaguchi · Bryan Hooi · Yiwei Wang · Xiaokui Xiao -
2021 Poster: Noether Networks: meta-learning useful conserved quantities »
Ferran Alet · Dylan Doblar · Allan Zhou · Josh Tenenbaum · Kenji Kawaguchi · Chelsea Finn -
2021 Poster: Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time »
Ferran Alet · Maria Bauza · Kenji Kawaguchi · Nurullah Giray Kuru · Tomás Lozano-Pérez · Leslie Kaelbling -
2021 Poster: Discrete-Valued Neural Communication »
Dianbo Liu · Alex Lamb · Kenji Kawaguchi · Anirudh Goyal · Chen Sun · Michael Mozer · Yoshua Bengio -
2020 Poster: Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models »
Adarsh Keshav Jeewajee · Leslie Kaelbling -
2020 : Doing for our robots what nature did for us »
Leslie Kaelbling -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Neural Relational Inference with Fast Modular Meta-learning »
Ferran Alet · Erica Weng · Tomás Lozano-Pérez · Leslie Kaelbling -
2018 : Discussion Panel: Ryan Adams, Nicolas Heess, Leslie Kaelbling, Shie Mannor, Emo Todorov (moderator: Roy Fox) »
Ryan Adams · Nicolas Heess · Leslie Kaelbling · Shie Mannor · Emo Todorov · Roy Fox -
2018 : On the Value of Knowing What You Don't Know: Learning to Sample and Sampling to Learn for Robot Planning (Leslie Kaelbling) »
Leslie Kaelbling -
2018 : Leslie Kaelbling »
Leslie Kaelbling -
2018 Workshop: Infer to Control: Probabilistic Reinforcement Learning and Structured Control »
Leslie Kaelbling · Martin Riedmiller · Marc Toussaint · Igor Mordatch · Roy Fox · Tuomas Haarnoja -
2018 : Talk 8: Leslie Kaelbling - Learning models of very large hybrid domains »
Leslie Kaelbling -
2018 Poster: Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior »
Zi Wang · Beomjoon Kim · Leslie Kaelbling -
2018 Spotlight: Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior »
Zi Wang · Beomjoon Kim · Leslie Kaelbling -
2016 Poster: Deep Learning without Poor Local Minima »
Kenji Kawaguchi -
2016 Oral: Deep Learning without Poor Local Minima »
Kenji Kawaguchi -
2015 Poster: Bayesian Optimization with Exponential Convergence »
Kenji Kawaguchi · Leslie Kaelbling · Tomás Lozano-Pérez -
2008 Poster: Multi-Agent Filtering with Infinitely Nested Beliefs »
Luke Zettlemoyer · Brian Milch · Leslie Kaelbling -
2008 Spotlight: Multi-Agent Filtering with Infinitely Nested Beliefs »
Luke Zettlemoyer · Brian Milch · Leslie Kaelbling -
2007 Workshop: The Grammar of Vision: Probabilistic Grammar-Based Models for Visual Scene Understanding and Object Categorization »
Virginia Savova · Josh Tenenbaum · Leslie Kaelbling · Alan Yuille