Timezone: »
Recently many algorithms were devised for reinforcement learning (RL) with function approximation. While they have clear algorithmic distinctions, they also have many implementation differences that are algorithm-independent and sometimes under-emphasized. Such mixing of algorithmic novelty and implementation craftsmanship makes rigorous analyses of the sources of performance improvements across algorithms difficult. In this work, we focus on a series of off-policy inference-based actor-critic algorithms -- MPO, AWR, and SAC -- to decouple their algorithmic innovations and implementation decisions. We present unified derivations through a single control-as-inference objective, where we can categorize each algorithm as based on either Expectation-Maximization (EM) or direct Kullback-Leibler (KL) divergence minimization and treat the rest of specifications as implementation details. We performed extensive ablation studies, and identified substantial performance drops whenever implementation details are mismatched for algorithmic choices. These results show which implementation or code details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC. We hope our work can inspire future work to further demystify sources of performance improvements across multiple algorithms and allow researchers to build on one another's both algorithmic and implementational innovations.
Author Information
Hiroki Furuta (The University of Tokyo)
Tadashi Kozuno (University of Alberta)
Tadashi Kozuno is a postdoc at the University of Alberta. He obtained bachelor and master degrees on neuroscience from Osaka university, and a PhD degree from Okinawa Inst. of Sci. and Tech. His main interest lies in efficient decision making from both theoretical and biological sides.
Tatsuya Matsushima (The University of Tokyo)
Yutaka Matsuo (University of Tokyo)
Shixiang (Shane) Gu (Google Brain, University of Cambridge)
More from the Same Authors
-
2021 Spotlight: Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization »
Yusuke Iwasawa · Yutaka Matsuo -
2021 Spotlight: A Minimalist Approach to Offline Reinforcement Learning »
Scott Fujimoto · Shixiang (Shane) Gu -
2021 : Distributional Decision Transformer for Offline Hindsight Information Matching »
Hiroki Furuta · Yutaka Matsuo · Shixiang (Shane) Gu -
2022 : Control Graph as Unified IO for Morphology-Task Generalization »
Hiroki Furuta · Yusuke Iwasawa · Yutaka Matsuo · Shixiang (Shane) Gu -
2022 : Control Graph as Unified IO for Morphology-Task Generalization »
Hiroki Furuta · Yusuke Iwasawa · Yutaka Matsuo · Shixiang (Shane) Gu -
2022 Poster: Large Language Models are Zero-Shot Reasoners »
Takeshi Kojima · Shixiang (Shane) Gu · Machel Reid · Yutaka Matsuo · Yusuke Iwasawa -
2022 Poster: Langevin Autoencoders for Learning Deep Latent Variable Models »
Shohei Taniguchi · Yusuke Iwasawa · Wataru Kumagai · Yutaka Matsuo -
2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine -
2021 Poster: A Minimalist Approach to Offline Reinforcement Learning »
Scott Fujimoto · Shixiang (Shane) Gu -
2021 Poster: Learning in two-player zero-sum partially observable Markov games with perfect recall »
Tadashi Kozuno · Pierre Ménard · Remi Munos · Michal Valko -
2021 Poster: Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation »
Yunhao Tang · Tadashi Kozuno · Mark Rowland · Remi Munos · Michal Valko -
2021 Poster: Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization »
Yusuke Iwasawa · Yutaka Matsuo -
2020 Poster: Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning »
Nino Vieillard · Tadashi Kozuno · Bruno Scherrer · Olivier Pietquin · Remi Munos · Matthieu Geist -
2020 Oral: Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning »
Nino Vieillard · Tadashi Kozuno · Bruno Scherrer · Olivier Pietquin · Remi Munos · Matthieu Geist -
2018 Poster: Data-Efficient Hierarchical Reinforcement Learning »
Ofir Nachum · Shixiang (Shane) Gu · Honglak Lee · Sergey Levine -
2017 Poster: Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning »
Shixiang (Shane) Gu · Timothy Lillicrap · Richard Turner · Zoubin Ghahramani · Bernhard Schölkopf · Sergey Levine -
2015 Poster: Particle Gibbs for Infinite Hidden Markov Models »
Nilesh Tripuraneni · Shixiang (Shane) Gu · Hong Ge · Zoubin Ghahramani -
2015 Poster: Neural Adaptive Sequential Monte Carlo »
Shixiang (Shane) Gu · Zoubin Ghahramani · Richard Turner