Timezone: »
One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.
Author Information
Harm Van Seijen (Microsoft Research)
Mehdi Fatemi (Microsoft)
Romain Laroche (Microsoft Research)
Joshua Romoff (McGill University)
Tavian Barnes (Microsoft)
Jeffrey Tsang
More from the Same Authors
-
2022 Poster: Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning »
Riashat Islam · Hongyu Zang · Anirudh Goyal · Alex Lamb · Kenji Kawaguchi · Xin Li · Romain Laroche · Yoshua Bengio · Remi Tachet des Combes -
2022 : Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information »
Riashat Islam · Manan Tomar · Alex Lamb · Hongyu Zang · Yonathan Efroni · Dipendra Misra · Aniket Didolkar · Xin Li · Harm Van Seijen · Remi Tachet des Combes · John Langford -
2022 : Replay Buffer With Local Forgetting for Adaptive Deep Model-Based Reinforcement Learning »
Ali Rahimi-Kalahroudi · Janarthanan Rajendran · Ida Momennejad · Harm Van Seijen · Sarath Chandar -
2022 Poster: When does return-conditioned supervised learning work for offline reinforcement learning? »
David Brandfonbrener · Alberto Bietti · Jacob Buckman · Romain Laroche · Joan Bruna -
2021 Poster: Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs »
harsh satija · Philip Thomas · Joelle Pineau · Romain Laroche -
2021 Poster: Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates »
Romain Laroche · Remi Tachet des Combes -
2021 Poster: Medical Dead-ends and Learning to Identify High-Risk States and Treatments »
Mehdi Fatemi · Taylor Killian · Jayakumar Subramanian · Marzyeh Ghassemi -
2020 Poster: The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning »
Harm Van Seijen · Hadi Nekoei · Evan Racah · Sarath Chandar -
2020 Poster: Learning Dynamic Belief Graphs to Generalize on Text-Based Games »
Ashutosh Adhikari · Xingdi Yuan · Marc-Alexandre Côté · Mikuláš Zelinka · Marc-Antoine Rondeau · Romain Laroche · Pascal Poupart · Jian Tang · Adam Trischler · Will Hamilton -
2019 Poster: Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning »
Mahmoud Assran · Joshua Romoff · Nicolas Ballas · Joelle Pineau · Mike Rabbat -
2019 Poster: Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Arash Tavakoli -
2019 Oral: Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Arash Tavakoli