Timezone: »
In environments with sparse rewards finding a good inductive bias for exploration is crucial to the agent’s success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiousity-driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e. it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent’s cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is vastly more efficient than other state of the art methods.
Author Information
Stefan Wagner
Peter Arndt (Heinrich-Heine-Universität Düsseldorf)
Jan Robine (Technische Universität Dortmund)
Stefan Harmeling (Technische Universität Dortmund)
More from the Same Authors
-
2022 : Optimizing Intermediate Representations of Generative Models for Phase Retrieval »
Tobias Uelwer · Sebastian Konietzny · Stefan Harmeling -
2022 : Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm »
Marc Höftmann · Jan Robine · Stefan Harmeling -
2022 : Transformer-based World Models Are Happy With 100k Interactions »
Jan Robine · Marc Höftmann · Tobias Uelwer · Stefan Harmeling -
2023 : Backward Learning for Goal-Conditioned Policies »
Marc Höftmann · Jan Robine · Stefan Harmeling -
2023 : A Simple Framework for Self-Supervised Learning of Sample-Efficient World Models »
Jan Robine · Marc Höftmann · Stefan Harmeling