NeurIPS What Would the Expert do()?: Causal Imitation Learning

Oral
in
Workshop: Safe and Robust Control of Uncertain Systems

What Would the Expert do()?: Causal Imitation Learning

Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu

[ Abstract ]

Abstract:

We develop algorithms for imitation learning from data that was corrupted by unobserved confounders. Sources of such confounding include (a) persistent perturbations to actions or (b) the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch onto, leading to poor policy performance. By utilizing the effect of past states on current states, we are able to break up these spurious correlations, an application of the econometric technique of instrumental variable regression. This insight leads to two novel algorithms, one of a generative-modeling flavor (\texttt{DoubIL}) that can utilize access to a simulator and one of a game-theoretic flavor (\texttt{ResiduIL}) that can be run offline. Both approaches are able to find policies that match the result of a query to an unconfounded expert. We find both algorithms compare favorably to non-causal approaches on simulated control problems.

Oral in Workshop: Safe and Robust Control of Uncertain Systems

What Would the Expert do()?: Causal Imitation Learning

Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu

Oral
in
Workshop: Safe and Robust Control of Uncertain Systems