Skip to yearly menu bar Skip to main content


Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations

Guanren Qiao · Guiliang Liu · Pascal Poupart · Zhiqiang Xu

Great Hall & Hall B1+B2 (level 1) #1306
[ ] [ Project Page ]
[ Paper [ Poster [ OpenReview
Tue 12 Dec 3:15 p.m. PST — 5:15 p.m. PST


Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms typically assume that the demonstration data is generated by a single type of expert. However, in practice, demonstrations often comprise a mixture of trajectories collected from various expert agents respecting different constraints, making it challenging to explain expert behaviors with a unified constraint function. To tackle this issue, we propose a Multi-Modal Inverse Constrained Reinforcement Learning (MMICRL) algorithm for simultaneously estimating multiple constraints corresponding to different types of experts. MMICRL constructs a flow-based density estimator that enables unsupervised expert identification from demonstrations, so as to infer the agent-specific constraints. Following these constraints, MMICRL imitates expert policies with a novel multi-modal constrained policy optimization objective that minimizes the agent-conditioned policy entropy and maximizes the unconditioned one. To enhance robustness, we incorporate this objective into the contrastive learning framework. This approach enables imitation policies to capture the diversity of behaviors among expert agents. Extensive experiments in both discrete and continuous environments show that MMICRL outperforms other baselines in terms of constraint recovery and control performance.

Chat is not available.