Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning Workshop

In the ZONE: Measuring difficulty and progression in curriculum generation

Rose Wang · Jesse Mu · Dilip Arumugam · Natasha Jaques · Noah Goodman

Abstract: A common strategy in curriculum generation for reinforcement learning is to train a teacher network to generate tasks that enable student learning. But, what kind of tasks enables this? One answer is tasks belonging to a student's zone of proximal development (ZPD), a concept from developmental psychology. These are tasks that are not too easy and not too hard for the student. Albeit intuitive, ZPD is not well understood computationally. We propose ZONE, a novel computational framework that operationalizes ZPD. It formalizes ZPD through the language of Bayesian probability theory, revealing that tasks should be selected by difficulty (the student's probability of task success) and learning progression (the degree of change in the student's model parameters). ZONE instantiates two techniques that enforce the teacher to pick tasks within the student's ZPD. One is \textsc{Reject}, which rejects tasks outside of a difficulty scope, and the other is \textsc{Grad}, which prioritizes tasks that maximize the student's gradient norm. We apply these techniques to existing curriculum learning algorithms. We show that they improve the student’s generalization performance on discrete MiniGrid environments and continuous control MuJoCo domains with up to $9 \times$ higher success. ZONE also accelerates the student's learning by training with $10\times$ less data.

Chat is not available.