Timezone: »
Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin. We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators. This is the most natural setting when collecting data from multiple human operators, yet remains unexplored. Since different demonstrators induce different data distributions, we show that this can be naturally framed as a domain generalization problem, with each demonstrator corresponding to a different domain. Specifically, we propose Domain-Invariant Model-based Offline RL (DIMORL), where we apply Risk Extrapolation (REx) (Krueger et al., 2020) to the process of learning dynamics and rewards models. Our results show that models trained with REx exhibit improved domain generalization performance when compared with the natural baseline of pooling all demonstrators' data. We observe that the resulting models frequently enable the learning of superior policies in the offline model-based RL setting, can improve the stability of the policy learning process, and potentially increase exploration.
Author Information
Alan Clark (University of Cambridge)
Shoaib Siddiqui (TU Kaiserslautern)
Robert Kirk (University College London)
I’m Robert Kirk, a PhD Student at UCL DARK Lab in the UCL Centre for Artificial Intelligence supervised by Tim Rocktäschel and Ed Grefenstette. I’m an aspiring effective altruist and rationalist. I’m interested in reinforcement learning, meta learning, natural language processing, interpretability and deep learning (and all the combinations thereof).
Usman Anwar (Information Technology University, Lahore, Pakistan)
Interested in AI Safety, Deep Learning, and Reinforcement Learning. Looking for a PhD!
Stephen Chung (University of Massachusetts Amherst)
My name is Stephen Chung, who graduated from the University of Massachusetts Amherst with a master's degree in 2021.
David Krueger (University of Cambridge)
More from the Same Authors
-
2021 : MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research »
Mikayel Samvelyan · Robert Kirk · Vitaly Kurin · Jack Parker-Holder · Minqi Jiang · Eric Hambro · Fabio Petroni · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel -
2021 : Graph Backup: Data Efficient Backup Exploiting Markovian Data »
zhengyao Jiang · Tianjun Zhang · Robert Kirk · Tim Rocktäschel · Edward Grefenstette -
2021 : Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models »
Enoch Tetteh · David Krueger · Joseph Paul Cohen · Yoshua Bengio -
2022 : Domain Generalization for Robust Model-Based Offline Reinforcement Learning »
Alan Clark · Shoaib Siddiqui · Robert Kirk · Usman Anwar · Stephen Chung · David Krueger -
2022 : Mechanistic Lens on Mode Connectivity »
Ekdeep S Lubana · Eric Bigelow · Robert Dick · David Krueger · Hidenori Tanaka -
2022 : On The Fragility of Learned Reward Functions »
Lev McKinney · Yawen Duan · Adam Gleave · David Krueger -
2022 : Training Equilibria in Reinforcement Learning »
Lauro Langosco · David Krueger · Adam Gleave -
2022 : Assistance with large language models »
Dmitrii Krasheninnikov · Egor Krasheninnikov · David Krueger -
2022 : Assistance with large language models »
Dmitrii Krasheninnikov · Egor Krasheninnikov · David Krueger -
2022 : Unifying Grokking and Double Descent »
Xander Davies · Lauro Langosco · David Krueger -
2023 Poster: Thinker: Learning to Plan and Act »
Stephen Chung · Ivan Anokhin · David Krueger -
2023 Workshop: Socially Responsible Language Modelling Research (SoLaR) »
Usman Anwar · David Krueger · Samuel Bowman · Jakob Foerster · Su Lin Blodgett · Roberta Raileanu · Alan Chan · Katherine Lee · Laura Ruis · Robert Kirk · Yawen Duan · Xin Chen · Kawin Ethayarajh -
2022 Poster: Defining and Characterizing Reward Gaming »
Joar Skalse · Nikolaus Howe · Dmitrii Krasheninnikov · David Krueger -
2021 Poster: Turing Completeness of Bounded-Precision Recurrent Neural Networks »
Stephen Chung · Hava Siegelmann -
2021 : The NetHack Challenge + Q&A »
Eric Hambro · Sharada Mohanty · Dipam Chakrabroty · Edward Grefenstette · Minqi Jiang · Robert Kirk · Vitaly Kurin · Heinrich Kuttler · Vegard Mella · Nantas Nardelli · Jack Parker-Holder · Roberta Raileanu · Tim Rocktäschel · Danielle Rothermel · Mikayel Samvelyan -
2021 Poster: MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents »
Stephen Chung