Timezone: »
Better state exploration using action sequence equivalence
Nathan Grinsztajn · Toby Johnstone · Johan Ferret · philippe preux
Event URL: https://openreview.net/forum?id=TQsTFJUGUKQ »
Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a \emph{tabula rasa} setting and must explore and learn everything from scratch.In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect.We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem.By replacing the usual $\epsilon$-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.
Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a \emph{tabula rasa} setting and must explore and learn everything from scratch.In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect.We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem.By replacing the usual $\epsilon$-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.
Author Information
Nathan Grinsztajn (Inria)
Toby Johnstone (Ecole polytechnique)
Johan Ferret (Google Brain / Inria Scool)
philippe preux (Inria)
More from the Same Authors
-
2021 Poster: There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning »
Nathan Grinsztajn · Johan Ferret · Olivier Pietquin · philippe preux · Matthieu Geist -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 : Oral Presentations »
Janith Petangoda · Sergio Pascual-Diaz · Jordi Grau-Moya · Raphaël Marinier · Olivier Pietquin · Alexei Efros · Phillip Isola · Trevor Darrell · Christopher Lu · Deepak Pathak · Johan Ferret