Timezone: »

Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal
Event URL: https://openreview.net/forum?id=IaaRcteVzuc »

Causal confusion is a phenomenon where an agent learns a policy that reflects imperfect spurious correlations in the data. The resulting causally confused behaviors may appear desirable during training but may fail at deployment. This problem gets exacerbated in domains such as robotics with potentially large gaps between open- and closed-loop performance of an agent. In such cases, a causally confused model may appear to perform well according to open-loop metrics but fail catastrophically when deployed in the real world. In this paper, we conduct the first study of causal confusion in offline reinforcement learning and hypothesize that selectively sampling data points that may help disambiguate the underlying causal mechanism of the environment may alleviate causal confusion. To investigate this hypothesis, we consider a set of simulated setups to study causal confusion and the ability of active sampling schemes to reduce its effects. We provide empirical evidence that random and active sampling schemes are able to consistently reduce causal confusion as training progresses and that active sampling is able to do so more efficiently than random sampling.

Author Information

Gunshi Gupta (University of Oxford)
Tim G. J. Rudner (University of Oxford)

Tim G. J. Rudner is a Computer Science PhD student at the University of Oxford supervised by Yarin Gal and Yee Whye Teh. His research interests span Bayesian deep learning, reinforcement learning, and variational inference. He obtained a master’s degree in statistics from the University of Oxford and an undergraduate degree in mathematics and economics from Yale University. Tim is also a Rhodes Scholar and a Fellow of the German National Academic Foundation.

Rowan McAllister (Toyota Research Institute)
Adrien Gaidon (Toyota Research Institute (TRI))
Yarin Gal (University of OXford)

More from the Same Authors