In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.
Harm Van Seijen (Microsoft Research)
Mehdi Fatemi (Microsoft Research)
Arash Tavakoli (Imperial College London)
I am a Ph.D. candidate at Imperial College London. My research interests lie broadly in Artificial Intelligence, with particular focus on Machine Learning and Reinforcement Learning.
Related Events (a corresponding poster, oral, or spotlight)
2019 Poster: Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning »
Wed Dec 11th 01:30 -- 03:30 AM Room East Exhibition Hall B + C
More from the Same Authors
2020 Poster: The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning »
Harm Van Seijen · Hadi Nekoei · Evan Racah · Sarath Chandar
2017 Poster: Hybrid Reward Architecture for Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Romain Laroche · Joshua Romoff · Tavian Barnes · Jeffrey Tsang