Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning Workshop

Learning a Domain-Agnostic Policy through Adversarial Representation Matching for Cross-Domain Policy Transfer

Hayato Watahiki · Ryo Iwase · Ryosuke Unno · Yoshimasa Tsuruoka


The low transferability of learned policies is one of the most critical problems limiting the applicability of learning-based solutions to decision-making tasks. In this paper, we present a way to align latent representations of states and actions between different domains by optimizing an adversarial objective. We train two models, a policy and a domain discriminator, with unpaired trajectories of proxy tasks through behavioral cloning as well as adversarial training. After the latent representations are aligned between domains, a domain-agnostic part of the policy trained with any method in the source domain can be immediately transferred to the target domain in a zero-shot manner. We empirically show that our simple approach achieves comparable performance to the latest methods in zero-shot cross-domain transfer. We also observe that our method performs better than other approaches in transfer between domains with different complexities, whereas other methods fail catastrophically.

Chat is not available.