Timezone: »

 
What Would the Expert do()?: Causal Imitation Learning
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu

We develop algorithms for imitation learning from data that was corrupted by unobserved confounders. Sources of such confounding include (a) persistent perturbations to actions or (b) the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch onto, leading to poor policy performance. By utilizing the effect of past states on current states, we are able to break up these spurious correlations, an application of the econometric technique of instrumental variable regression. This insight leads to two novel algorithms, one of a generative-modeling flavor (\texttt{DoubIL}) that can utilize access to a simulator and one of a game-theoretic flavor (\texttt{ResiduIL}) that can be run offline. Both approaches are able to find policies that match the result of a query to an unconfounded expert. We find both algorithms compare favorably to non-causal approaches on simulated control problems.

Author Information

Gokul Swamy (Carnegie Mellon University)
Sanjiban Choudhury (Aurora Innovation)
James Bagnell (Aurora Innovation)
Steven Wu (Carnegie Mellon University)
Steven Wu

I am an Assistant Professor in the School of Computer Science at Carnegie Mellon University. My broad research interests are in algorithms and machine learning. These days I am excited about: - Foundations of responsible AI, with emphasis on privacy and fairness considerations. - Interactive learning, including contextual bandits and reinforcement learning, and its interactions with causal inference and econometrics. - Economic aspects of machine learning, with a focus on learning in the presence of strategic agents.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors