Timezone: »

 
Domain Generalization for Robust Model-Based Offline RL
Alan Clark · Shoaib Siddiqui · Robert Kirk · Usman Anwar · Stephen Chung · David Krueger
Event URL: https://openreview.net/forum?id=vW98Mf3shM »

Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin. We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators. This is the most natural setting when collecting data from multiple human operators, yet remains unexplored. Since different demonstrators induce different data distributions, we show that this can be naturally framed as a domain generalization problem, with each demonstrator corresponding to a different domain. Specifically, we propose Domain-Invariant Model-based Offline RL (DIMORL), where we apply Risk Extrapolation (REx) (Krueger et al., 2020) to the process of learning dynamics and rewards models. Our results show that models trained with REx exhibit improved domain generalization performance when compared with the natural baseline of pooling all demonstrators' data. We observe that the resulting models frequently enable the learning of superior policies in the offline model-based RL setting, can improve the stability of the policy learning process, and potentially increase exploration.

Author Information

Alan Clark (University of Cambridge)
Shoaib Siddiqui (TU Kaiserslautern)
Robert Kirk (University College London)

I’m Robert Kirk, a PhD Student at UCL DARK Lab in the UCL Centre for Artificial Intelligence supervised by Tim Rocktäschel and Ed Grefenstette. I’m an aspiring effective altruist and rationalist. I’m interested in reinforcement learning, meta learning, natural language processing, interpretability and deep learning (and all the combinations thereof).

Usman Anwar (Information Technology University, Lahore, Pakistan)

Interested in AI Safety, Deep Learning, and Reinforcement Learning. Looking for a PhD!

Stephen Chung (University of Massachusetts Amherst)

My name is Stephen Chung, who graduated from the University of Massachusetts Amherst with a master's degree in 2021.

David Krueger (University of Cambridge)

More from the Same Authors