Timezone: »
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.
Author Information
Nir Levine (DeepMind)
Tom Zahavy (The Technion)
Daniel J Mankowitz (Technion)
Aviv Tamar (UC Berkeley)
Shie Mannor (Technion)
More from the Same Authors
-
2021 Spotlight: RL for Latent MDPs: Regret Guarantees and a Lower Bound »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2021 Spotlight: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 : Reinforcement Learning in Reward-Mixing MDPs »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2021 : Deep Variational Semi-Supervised Novelty Detection »
Tal Daniel · Thanard Kurutach · Aviv Tamar -
2021 : Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning »
Guy Tennenholtz · Assaf Hallak · Gal Dalal · Shie Mannor · Gal Chechik · Uri Shalit -
2021 : Latent Geodesics of Model Dynamics for Offline Reinforcement Learning »
Guy Tennenholtz · Nir Baram · Shie Mannor -
2021 : Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning »
Roy Zohar · Shie Mannor · Guy Tennenholtz -
2021 : Deep Variational Semi-Supervised Novelty Detection »
Tal Daniel · Thanard Kurutach · Aviv Tamar -
2022 : Learning Control by Iterative Inversion »
Gal Leibovich · Guy Jacob · Or Avner · Gal Novik · Aviv Tamar -
2022 : DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles »
Peter Karkus · Boris Ivanovic · Shie Mannor · Marco Pavone -
2022 : Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs »
Benjamin Fuhrer · Yuval Shpigelman · Chen Tessler · Shie Mannor · Gal Chechik · Eitan Zahavi · Gal Dalal -
2022 : SoftTreeMax: Policy Gradient with Tree Search »
Gal Dalal · Assaf Hallak · Shie Mannor · Gal Chechik -
2022 : Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs »
Benjamin Fuhrer · Yuval Shpigelman · Chen Tessler · Shie Mannor · Gal Chechik · Eitan Zahavi · Gal Dalal -
2022 Poster: Tractable Optimality in Episodic Latent MABs »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Poster: Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach »
Zohar Rimon · Aviv Tamar · Gilad Adler -
2022 Poster: Reinforcement Learning with a Terminator »
Guy Tennenholtz · Nadav Merlis · Lior Shani · Shie Mannor · Uri Shalit · Gal Chechik · Assaf Hallak · Gal Dalal -
2022 Poster: Finite Sample Analysis Of Dynamic Regression Parameter Learning »
Mark Kozdoba · Edward Moroshko · Shie Mannor · Yacov Crammer -
2022 Poster: Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning »
Guy Tennenholtz · Shie Mannor -
2022 Poster: Efficient Risk-Averse Reinforcement Learning »
Ido Greenberg · Yinlam Chow · Mohammad Ghavamzadeh · Shie Mannor -
2021 : Safe RL Panel Discussion »
Animesh Garg · Marek Petrik · Shie Mannor · Claire Tomlin · Ugo Rosolia · Dylan Hadfield-Menell -
2021 : Shie Mannor »
Shie Mannor -
2021 : Shie Mannor »
Shie Mannor -
2021 : Bootstrapped Meta-Learning »
Sebastian Flennerhag · Yannick Schroecker · Tom Zahavy · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 Poster: Twice regularized MDPs and the equivalence between robustness and regularization »
Esther Derman · Matthieu Geist · Shie Mannor -
2021 Poster: Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies »
Ron Dorfman · Idan Shenfeld · Aviv Tamar -
2021 Poster: RL for Latent MDPs: Regret Guarantees and a Lower Bound »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2021 Poster: Sim and Real: Better Together »
Shirli Di-Castro · Dotan Di Castro · Shie Mannor -
2021 Poster: Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction »
Gal Dalal · Assaf Hallak · Steven Dalton · iuri frosio · Shie Mannor · Gal Chechik -
2021 Poster: Discovery of Options via Meta-Learned Subgoals »
Vivek Veeriah · Tom Zahavy · Matteo Hessel · Zhongwen Xu · Junhyuk Oh · Iurii Kemaev · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Reinforcement Learning in Reward-Mixing MDPs »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2020 : Mini-panel discussion 1 - Bridging the gap between theory and practice »
Aviv Tamar · Emma Brunskill · Jost Tobias Springenberg · Omer Gottesman · Daniel Mankowitz -
2020 : Keynote: Aviv Tamar »
Aviv Tamar -
2020 Poster: A Self-Tuning Actor-Critic Algorithm »
Tom Zahavy · Zhongwen Xu · Vivek Veeriah · Matteo Hessel · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 : Poster Spotlight 1 »
David Brandfonbrener · Joan Bruna · Tom Zahavy · Haim Kaplan · Yishay Mansour · Nikos Karampatziakis · John Langford · Paul Mineiro · Donghwan Lee · Niao He -
2019 : Adaptive Trust Region Policy Optimization: Convergence and Faster Rates of regularized MDPs »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies »
Yonathan Efroni · Nadav Merlis · Mohammad Ghavamzadeh · Shie Mannor -
2019 Spotlight: Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies »
Yonathan Efroni · Nadav Merlis · Mohammad Ghavamzadeh · Shie Mannor -
2018 Workshop: Wordplay: Reinforcement and Language Learning in Text-based Games »
Adam Trischler · Angeliki Lazaridou · Yonatan Bisk · Wendy Tay · Nate Kushman · Marc-Alexandre Côté · Alessandro Sordoni · Daniel Ricks · Tom Zahavy · Hal Daumé III -
2018 Poster: Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2018 Spotlight: Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2018 Poster: Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning »
Tom Zahavy · Matan Haroush · Nadav Merlis · Daniel J Mankowitz · Shie Mannor -
2018 Poster: Learning Plannable Representations with Causal InfoGAN »
Thanard Kurutach · Aviv Tamar · Ge Yang · Stuart Russell · Pieter Abbeel -
2017 : Poster session 2 and coffee break »
Sean McGregor · Tobias Hagge · Markus Stoye · Trang Thi Minh Pham · Seungkyun Hong · Amir Farbin · Sungyong Seo · Susana Zoghbi · Daniel George · Stanislav Fort · Steven Farrell · Arthur Pajot · Kyle Pearson · Adam McCarthy · Cecile Germain · Dustin Anderson · Mario Lezcano Casado · Mayur Mudigonda · Benjamin Nachman · Luke de Oliveira · Li Jing · Lingge Li · Soo Kyung Kim · Timothy Gebhard · Tom Zahavy -
2017 : Poster session 1 and coffee break »
Tobias Hagge · Sean McGregor · Markus Stoye · Trang Thi Minh Pham · Seungkyun Hong · Amir Farbin · Sungyong Seo · Susana Zoghbi · Daniel George · Stanislav Fort · Steven Farrell · Arthur Pajot · Kyle Pearson · Adam McCarthy · Cecile Germain · Dustin Anderson · Mario Lezcano Casado · Mayur Mudigonda · Benjamin Nachman · Luke de Oliveira · Li Jing · Lingge Li · Soo Kyung Kim · Timothy Gebhard · Tom Zahavy -
2017 Poster: Rotting Bandits »
Nir Levine · Yacov Crammer · Shie Mannor -
2017 Poster: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments »
Ryan Lowe · YI WU · Aviv Tamar · Jean Harb · OpenAI Pieter Abbeel · Igor Mordatch -
2016 Poster: Value Iteration Networks »
Aviv Tamar · Sergey Levine · Pieter Abbeel · YI WU · Garrett Thomas -
2016 Oral: Value Iteration Networks »
Aviv Tamar · Sergey Levine · Pieter Abbeel · YI WU · Garrett Thomas -
2016 Poster: Adaptive Skills Adaptive Partitions (ASAP) »
Daniel J Mankowitz · Timothy A Mann · Shie Mannor -
2015 : Between stochastic and adversarial: forecasting with online ARMA models »
Shie Mannor -
2015 Workshop: Machine Learning for (e-)Commerce »
Esteban Arcaute · Mohammad Ghavamzadeh · Shie Mannor · Georgios Theocharous -
2015 Poster: Online Learning for Adversaries with Memory: Price of Past Mistakes »
Oren Anava · Elad Hazan · Shie Mannor -
2015 Poster: Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach »
Yinlam Chow · Aviv Tamar · Shie Mannor · Marco Pavone -
2015 Poster: Policy Gradient for Coherent Risk Measures »
Aviv Tamar · Yinlam Chow · Mohammad Ghavamzadeh · Shie Mannor -
2015 Poster: Community Detection via Measure Space Embedding »
Mark Kozdoba · Shie Mannor -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Poster: "How hard is my MDP?" The distribution-norm to the rescue »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor -
2014 Poster: Robust Logistic Regression and Classification »
Jiashi Feng · Huan Xu · Shie Mannor · Shuicheng Yan -
2014 Oral: "How hard is my MDP?" The distribution-norm to the rescue »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor -
2013 Poster: Reinforcement Learning in Robust Markov Decision Processes »
Shiau Hong Lim · Huan Xu · Shie Mannor -
2013 Poster: Online PCA for Contaminated Data »
Jiashi Feng · Huan Xu · Shie Mannor · Shuicheng Yan -
2013 Poster: Learning Multiple Models via Regularized Weighting »
Daniel Vainsencher · Shie Mannor · Huan Xu -
2012 Poster: The Perturbed Variation »
Maayan Harel · Shie Mannor -
2011 Poster: From Bandits to Experts: On the Value of Side-Observations »
Shie Mannor · Ohad Shamir -
2011 Spotlight: From Bandits to Experts: On the Value of Side-Observations »
Shie Mannor · Ohad Shamir -
2011 Poster: Committing Bandits »
Loc X Bui · Ramesh Johari · Shie Mannor -
2010 Spotlight: Online Classification with Specificity Constraints »
Andrey Bernstein · Shie Mannor · Nahum Shimkin -
2010 Poster: Online Classification with Specificity Constraints »
Andrey Bernstein · Shie Mannor · Nahum Shimkin -
2010 Poster: Distributionally Robust Markov Decision Processes »
Huan Xu · Shie Mannor