Curriculum Learning as Transport: Training Along Wasserstein Geodesics
Abstract
Curriculum learning is widely used but remains poorly understood: its success often depends on ad-hoc design choices, making it unclear when and why it helps. We investigate these questions through the lens of Wasserstein curriculum learning---a framework that interpolates between easy and hard data distributions along the Wasserstein geodesic. While prior work has applied this idea in narrow reinforcement learning settings, we present the first controlled study across synthetic tasks and reasoning benchmarks with large language models. Our experiments map how endpoint distributions and traversal rates shape learning, and identify mechanisms by which curricula accelerate progress, especially on difficult examples. Empirically, Wasserstein curricula can match target accuracy with fewer examples than uniform sampling or linear blends, but only under specific conditions. Together, these findings offer new insights into when Wasserstein curricula are effective and why, advancing our understanding of curriculum learning beyond task-specific heuristics.