Timezone: »
Delayed rewards, which are separated from their causative actions by irrelevant actions, hamper learning in reinforcement learning (RL). Especially real world problems often contain such delayed and sparse rewards. Recently, return decomposition for delayed rewards (RUDDER) employed pattern recognition to remove or reduce delay in rewards, which dramatically simplifies the learning task of the underlying RL method. RUDDER was realized using a long short-term memory (LSTM). The LSTM was trained to identify important state-action pair patterns, responsible for the return. Reward was then redistributed to these important state-action pairs. However, training the LSTM is often difficult and requires a large number of episodes. In this work, we replace the LSTM with the recently proposed continuous modern Hopfield networks (MHN) and introduce Hopfield-RUDDER. MHN are powerful trainable associative memories with large storage capacity. They require only few training samples and excel at identifying and recognizing patterns. We use this property of MHN to identify important state-action pairs that are associated with low or high return episodes and directly redistribute reward to them. However, in partially observable environments, Hopfield-RUDDER requires additional information about the history of state-action pairs. Therefore, we evaluate several methods for compressing history and introduce reset-max history, a lightweight history compression using the max-operator in combination with a reset gate. We experimentally show that Hopfield-RUDDER is able to outperform LSTM-based RUDDER on various 1D environments with small numbers of episodes. Finally, we show in preliminary experiments that Hopfield-RUDDER scales to highly complex environments with the Minecraft ObtainDiamond task from the MineRL NeurIPS challenge.
Author Information
Michael Widrich (Ellis Unit / University Linz)
Markus Hofmarcher (ELLIS Unit / University Linz)
Vihang Patil (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria)
Angela Bitto (JKU)
Sepp Hochreiter (LIT AI Lab / University Linz)
Head of the LIT AI Lab and Professor of bioinformatics at the University of Linz. First to identify and analyze the vanishing gradient problem, the fundamental deep learning problem, in 1991. First author of the main paper on the now widely used LSTM RNNs. He implemented 'learning how to learn' (meta-learning) networks via LSTM RNNs and applied Deep Learning and RNNs to self-driving cars, sentiment analysis, reinforcement learning, bioinformatics, and medicine.
More from the Same Authors
-
2021 : Assigning Credit to Human Decisions using Modern Hopfield Networks »
Michael Widrich · Markus Hofmarcher · Vihang Patil · Angela Bitto · Sepp Hochreiter -
2021 : Understanding the Effects of Dataset Composition on Offline Reinforcement Learning »
Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Philipp Renz · Angela Bitto · Vihang Patil · Sepp Hochreiter -
2021 : Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning »
Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Philipp Renz · Angela Bitto · Vihang Patil · Sepp Hochreiter -
2021 : Modern Hopfield Networks for Return Decomposition for Delayed Rewards »
Michael Widrich · Markus Hofmarcher · Vihang Patil · Angela Bitto · Sepp Hochreiter -
2022 : Boosting Multi-modal Contrastive Learning with Modern Hopfield Networks and InfoLOOB »
Andreas Fürst · Elisabeth Rumetshofer · Johannes Lehner · Viet T. Tran · Fei Tang · Hubert Ramsauer · David Kreil · Michael Kopp · Günter Klambauer · Angela Bitto · Sepp Hochreiter -
2022 : Modern Hopfield Networks for Iterative Learning on Tabular Data »
Bernhard Schäfl · Lukas Gruber · Angela Bitto · Sepp Hochreiter -
2022 : Toward Semantic History Compression for Reinforcement Learning »
Fabian Paischer · Thomas Adler · Andreas Radler · Markus Hofmarcher · Sepp Hochreiter -
2022 : Foundation Models for History Compression in Reinforcement Learning »
Fabian Paischer · Thomas Adler · Andreas Radler · Markus Hofmarcher · Sepp Hochreiter -
2022 : Toward Semantic History Compression for Reinforcement Learning »
Fabian Paischer · Thomas Adler · Andreas Radler · Markus Hofmarcher · Sepp Hochreiter -
2022 : Informative rewards and generalization in curriculum learning »
Rahul Siripurapu · Vihang Patil · Kajetan Schweighofer · Marius-Constantin Dinu · Markus Holzleitner · Hamid Eghbalzadeh · Luis Ferro · Thomas Schmied · Michael Kopp · Sepp Hochreiter -
2022 : Foundation Models for History Compression in Reinforcement Learning »
Fabian Paischer · Thomas Adler · Andreas Radler · Markus Hofmarcher · Sepp Hochreiter -
2022 Poster: CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP »
Andreas Fürst · Elisabeth Rumetshofer · Johannes Lehner · Viet T. Tran · Fei Tang · Hubert Ramsauer · David Kreil · Michael Kopp · Günter Klambauer · Angela Bitto · Sepp Hochreiter -
2021 : Understanding the Effects of Dataset Composition on Offline Reinforcement Learning »
Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Angela Bitto · Philipp Renz · Vihang Patil · Sepp Hochreiter -
2021 : Understanding the Effects of Dataset Composition on Offline Reinforcement Learning »
Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Angela Bitto · Philipp Renz · Vihang Patil · Sepp Hochreiter -
2020 Poster: Modern Hopfield Networks and Attention for Immune Repertoire Classification »
Michael Widrich · Bernhard Schäfl · Milena Pavlović · Hubert Ramsauer · Lukas Gruber · Markus Holzleitner · Johannes Brandstetter · Geir Kjetil Sandve · Victor Greiff · Sepp Hochreiter · Günter Klambauer -
2020 Spotlight: Modern Hopfield Networks and Attention for Immune Repertoire Classification »
Michael Widrich · Bernhard Schäfl · Milena Pavlović · Hubert Ramsauer · Lukas Gruber · Markus Holzleitner · Johannes Brandstetter · Geir Kjetil Sandve · Victor Greiff · Sepp Hochreiter · Günter Klambauer -
2020 : Modern Hopfield Networks and Attention for Immune Repertoire Classification »
Michael Widrich -
2019 Poster: RUDDER: Return Decomposition for Delayed Rewards »
Jose A. Arjona-Medina · Michael Gillhofer · Michael Widrich · Thomas Unterthiner · Johannes Brandstetter · Sepp Hochreiter -
2017 : Invited Talk 3 »
Sepp Hochreiter -
2017 : Panel: Machine learning and audio signal processing: State of the art and future perspectives »
Sepp Hochreiter · Bo Li · Karen Livescu · Arindam Mandal · Oriol Nieto · Malcolm Slaney · Hendrik Purwins -
2017 Spotlight: Self-Normalizing Neural Networks »
Günter Klambauer · Thomas Unterthiner · Andreas Mayr · Sepp Hochreiter -
2017 Poster: Self-Normalizing Neural Networks »
Günter Klambauer · Thomas Unterthiner · Andreas Mayr · Sepp Hochreiter -
2017 Poster: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium »
Martin Heusel · Hubert Ramsauer · Thomas Unterthiner · Bernhard Nessler · Sepp Hochreiter -
2016 Symposium: Recurrent Neural Networks and Other Machines that Learn Algorithms »
Jürgen Schmidhuber · Sepp Hochreiter · Alex Graves · Rupesh K Srivastava -
2015 Poster: Rectified Factor Networks »
Djork-Arné Clevert · Andreas Mayr · Thomas Unterthiner · Sepp Hochreiter