Timezone: »
Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process.In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward.In this paper, we leverage this idea and propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal. These auxiliary objectives use a question generation (QG) and a question answering (QA) system: they consist of questions leading the agent to try to reconstruct partial information about the global goal using its own trajectory.When it succeeds, it receives an intrinsic reward proportional to its confidence in its answer. This incentivizes the agent to generate trajectories which unambiguously explain various aspects of the general language goal.Our experimental study using various BabyAI environments shows that this approach, which does not require engineer intervention to design the auxiliary objectives, improves sample efficiency by effectively directing the exploration.
Author Information
Thomas Carta (INRIA)
Ph.D. Candidate in the Inria Flowers Team on Language-guided autonomous deep reinforcement learning agents. Supervisors: Mr Pierre-Yves Oudeyer (Inria Bordeaux Flowers Team), Mr Olivier Sigaud (ISIR Amac Team) and, Mr Sylvain Lamprier (Anger Université).
Pierre-Yves Oudeyer (INRIA)
Olivier Sigaud (Sorbonne University)
Sylvain Lamprier (Université d'Angers)
More from the Same Authors
-
2022 : Overcoming Referential Ambiguity in language-guided goal-conditioned Reinforcement Learning »
Hugo Caselles-Dupré · Olivier Sigaud · Mohamed CHETOUANI -
2022 : Using Confounded Data in Offline RL »
Maxime Gasse · Damien GRASSET · Guillaume Gaudron · Pierre-Yves Oudeyer -
2022 Poster: Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments »
Hugo Caselles-Dupré · Olivier Sigaud · Mohamed CHETOUANI -
2021 : Sculpting (human-like) AI systems by sculpting their (social) environments »
Pierre-Yves Oudeyer -
2021 Poster: Grounding Spatio-Temporal Language with Transformers »
Tristan Karch · Laetitia Teodorescu · Katja Hofmann · Clément Moulin-Frier · Pierre-Yves Oudeyer -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Invited talk: PierreYves Oudeyer "Machines that invent their own problems: Towards open-ended learning of skills" »
Pierre-Yves Oudeyer -
2020 Poster: Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems »
Mayalen Etcheverry · Clément Moulin-Frier · Pierre-Yves Oudeyer -
2020 Oral: Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems »
Mayalen Etcheverry · Clément Moulin-Frier · Pierre-Yves Oudeyer -
2020 Poster: Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration »
Cédric Colas · Tristan Karch · Nicolas Lair · Jean-Michel Dussoux · Clément Moulin-Frier · Peter F Dominey · Pierre-Yves Oudeyer -
2019 Poster: Learning Compositional Neural Programs with Recursive Tree Search and Planning »
Thomas PIERROT · Guillaume Ligner · Scott Reed · Olivier Sigaud · Nicolas Perrin · Alexandre Laterre · David Kas · Karim Beguir · Nando de Freitas -
2019 Spotlight: Learning Compositional Neural Programs with Recursive Tree Search and Planning »
Thomas PIERROT · Guillaume Ligner · Scott Reed · Olivier Sigaud · Nicolas Perrin · Alexandre Laterre · David Kas · Karim Beguir · Nando de Freitas -
2016 Demonstration: Autonomous exploration, active learning and human guidance with open-source Poppy humanoid robot platform and Explauto library »
Sébastien Forestier · Yoan Mollard · Pierre-Yves Oudeyer -
2012 Poster: Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress »
Manuel Lopes · Tobias Lang · Marc Toussaint · Pierre-Yves Oudeyer