Timezone: »
Using Confounded Data in Offline RL
Maxime Gasse · Damien GRASSET · Guillaume Gaudron · Pierre-Yves Oudeyer
Event URL: https://openreview.net/forum?id=NGLwjQj4TY »
In this work we consider the problem of confounding in offline RL, also called the delusion problem. While it is known that learning from purely offline data is a hazardous endeavor in the presence of confounding, in this paper we show that offline, confounded data can be safely combined with online, non-confounded data to improve the sample-efficiency of model-based RL. We import ideas from the well-established framework of $do$-calculus to express model-based RL as a causal inference problem, thus bridging the fields of RL and causality. We propose a latent-based method which we prove is correct and efficient, in the sense that it attains better generalization guarantees thanks to the offline, confounded data (in the asymptotic case), regardless of the expert's behavior. We illustrate the effectiveness of our method on a series of synthetic experiments.
In this work we consider the problem of confounding in offline RL, also called the delusion problem. While it is known that learning from purely offline data is a hazardous endeavor in the presence of confounding, in this paper we show that offline, confounded data can be safely combined with online, non-confounded data to improve the sample-efficiency of model-based RL. We import ideas from the well-established framework of $do$-calculus to express model-based RL as a causal inference problem, thus bridging the fields of RL and causality. We propose a latent-based method which we prove is correct and efficient, in the sense that it attains better generalization guarantees thanks to the offline, confounded data (in the asymptotic case), regardless of the expert's behavior. We illustrate the effectiveness of our method on a series of synthetic experiments.
Author Information
Maxime Gasse (Polytechnique Montréal)
I am a machine learning researcher within the Data Science for Real-Time Decision Making Canada Excellence Research Chair (CERC), and also part of the MILA research institute on artificial intelligence in Montréal, Canada. The question that motivates my research is: can machines think? My broad research interests include: - probabilistic graphical models and their theoretical properties (my PhD Thesis) - structured prediction, in particular multi-label classification - combinatorial optimization using machine learning (see our Ecole library) - causality, specifically in the context of reinforcement learning
Damien GRASSET (Ecole Polytechnique, France)
Guillaume Gaudron (Ubisoft)
Pierre-Yves Oudeyer (INRIA)
More from the Same Authors
-
2021 Competition: Machine Learning for Combinatorial Optimization (ML4CO) »
Christopher Morris · Maxime Gasse -
2023 : SBMLtoODEjax: Efficient Simulation and Optimization of Biological Network Models in JAX »
Mayalen Etcheverry · Mayalen Etcheverry · Michael Levin · Michael Levin · Clément Moulin-Frier · Pierre-Yves Oudeyer -
2023 : Codeplay: Autotelic Learning through Collaborative Self-Play in Programming Environments »
Laetitia Teodorescu · Cédric Colas · Matthew Bowers · Thomas Carta · Pierre-Yves Oudeyer -
2023 : Paper 35: Generative AI in the classroom: can student remain active learners? »
RANIA ABDELGHANI · Hélène Sauzéon · Pierre-Yves Oudeyer -
2023 : ACES: generating diverse programming puzzles with autotelic language models and semantic descriptors »
Julien Pourcel · Cédric Colas · Pierre-Yves Oudeyer · Laetitia Teodorescu -
2023 : The Unsolved Challenges of LLMs in Open-Ended Web Tasks: A Case Study »
Rim Assouel · Tom Marty · Massimo Caccia · Issam Hadj Laradji · Alexandre Drouin · Sai Rajeswar Mudumba · Hector Palacios · Quentin Cappart · David Vazquez · Nicolas Chapados · Maxime Gasse · Alexandre Lacoste -
2023 : The Unsolved Challenges of LLMs in Open-Ended Web Tasks: A Case Study »
Rim Assouel · Tom Marty · Massimo Caccia · Issam Hadj Laradji · Alexandre Drouin · Sai Rajeswar Mudumba · Hector Palacios · Quentin Cappart · David Vazquez · Nicolas Chapados · Maxime Gasse · Alexandre Lacoste -
2022 Poster: EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL »
Thomas Carta · Pierre-Yves Oudeyer · Olivier Sigaud · Sylvain Lamprier -
2022 Poster: Learning to Branch with Tree MDPs »
Lara Scavuzzo · Feng Chen · Didier Chetelat · Maxime Gasse · Andrea Lodi · Neil Yorke-Smith · Karen Aardal -
2021 : Sculpting (human-like) AI systems by sculpting their (social) environments »
Pierre-Yves Oudeyer -
2021 : Machine Learning for Combinatorial Optimization + Q&A »
Maxime Gasse · Simon Bowly · Chris Cameron · Quentin Cappart · Jonas Charfreitag · Laurent Charlin · Shipra Agrawal · Didier Chetelat · Justin Dumouchelle · Ambros Gleixner · Aleksandr Kazachkov · Elias Khalil · Pawel Lichocki · Andrea Lodi · Miles Lubin · Christopher Morris · Dimitri Papageorgiou · Augustin Parjadis · Sebastian Pokutta · Antoine Prouvost · Yuandong Tian · Lara Scavuzzo · Giulia Zarpellon -
2021 Poster: Grounding Spatio-Temporal Language with Transformers »
Tristan Karch · Laetitia Teodorescu · Katja Hofmann · Clément Moulin-Frier · Pierre-Yves Oudeyer -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Invited talk: PierreYves Oudeyer "Machines that invent their own problems: Towards open-ended learning of skills" »
Pierre-Yves Oudeyer -
2020 Poster: Hybrid Models for Learning to Branch »
Prateek Gupta · Maxime Gasse · Elias Khalil · Pawan K Mudigonda · Andrea Lodi · Yoshua Bengio -
2020 Poster: Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems »
Mayalen Etcheverry · Clément Moulin-Frier · Pierre-Yves Oudeyer -
2020 Oral: Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems »
Mayalen Etcheverry · Clément Moulin-Frier · Pierre-Yves Oudeyer -
2020 Poster: Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration »
Cédric Colas · Tristan Karch · Nicolas Lair · Jean-Michel Dussoux · Clément Moulin-Frier · Peter F Dominey · Pierre-Yves Oudeyer -
2016 Demonstration: Autonomous exploration, active learning and human guidance with open-source Poppy humanoid robot platform and Explauto library »
Sébastien Forestier · Yoan Mollard · Pierre-Yves Oudeyer -
2012 Poster: Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress »
Manuel Lopes · Tobias Lang · Marc Toussaint · Pierre-Yves Oudeyer