Timezone: »
Poster
Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies
Ron Dorfman · Idan Shenfeld · Aviv Tamar
Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution. In particular, while each conventional RL agent explored and exploited its own different task, the meta-agent must identify regularities in the data that lead to effective exploration/exploitation in the unseen task. Here, we take a Bayesian RL (BRL) view, and seek to learn a Bayes-optimal policy from the offline data. Building on the recent VariBAD BRL approach, we develop an off-policy BRL method that learns to plan an exploration strategy based on an adaptive neural belief estimate. However, learning to infer such a belief from offline data brings a new identifiability issue we term MDP ambiguity. We characterize the problem, and suggest resolutions via data collection and modification procedures.Finally, we evaluate our framework on a diverse set of domains, including difficult sparse reward tasks, and demonstrate learning of effective exploration behavior that is qualitatively different from the exploration used by any RL agent in the data. Our code is available online at \url{https://github.com/Rondorf/BOReL}.
Author Information
Ron Dorfman (Technion - Isreal Institute of Technology)
Idan Shenfeld (Technion)
Aviv Tamar (Technion)
More from the Same Authors
-
2021 : Deep Variational Semi-Supervised Novelty Detection »
Tal Daniel · Thanard Kurutach · Aviv Tamar -
2021 : Deep Variational Semi-Supervised Novelty Detection »
Tal Daniel · Thanard Kurutach · Aviv Tamar -
2022 : Learning Control by Iterative Inversion »
Gal Leibovich · Guy Jacob · Or Avner · Gal Novik · Aviv Tamar -
2023 Poster: Explore to Generalize in Zero-Shot RL »
Ev Zisselman · Itai Lavie · Daniel Soudry · Aviv Tamar -
2023 Workshop: Generalization in Planning (GenPlan '23) »
Pulkit Verma · Siddharth Srivastava · Aviv Tamar · Felipe Trevizan -
2022 Poster: Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach »
Zohar Rimon · Aviv Tamar · Gilad Adler -
2020 : Mini-panel discussion 1 - Bridging the gap between theory and practice »
Aviv Tamar · Emma Brunskill · Jost Tobias Springenberg · Omer Gottesman · Daniel Mankowitz -
2020 : Keynote: Aviv Tamar »
Aviv Tamar -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · RaphaĆ«l Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2017 Poster: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments »
Ryan Lowe · YI WU · Aviv Tamar · Jean Harb · OpenAI Pieter Abbeel · Igor Mordatch