Timezone: »

 
In the ZONE: Measuring difficulty and progression in curriculum generation
Rose Wang · Jesse Mu · Dilip Arumugam · Natasha Jaques · Noah Goodman
Event URL: https://openreview.net/forum?id=SgBHmHMctfd »
A common strategy in curriculum generation for reinforcement learning is to train a teacher network to generate tasks that enable student learning. But, what kind of tasks enables this? One answer is tasks belonging to a student's zone of proximal development (ZPD), a concept from developmental psychology. These are tasks that are not too easy and not too hard for the student. Albeit intuitive, ZPD is not well understood computationally. We propose ZONE, a novel computational framework that operationalizes ZPD. It formalizes ZPD through the language of Bayesian probability theory, revealing that tasks should be selected by difficulty (the student's probability of task success) and learning progression (the degree of change in the student's model parameters). ZONE instantiates two techniques that enforce the teacher to pick tasks within the student's ZPD. One is \textsc{Reject}, which rejects tasks outside of a difficulty scope, and the other is \textsc{Grad}, which prioritizes tasks that maximize the student's gradient norm. We apply these techniques to existing curriculum learning algorithms. We show that they improve the student’s generalization performance on discrete MiniGrid environments and continuous control MuJoCo domains with up to $9 \times$ higher success. ZONE also accelerates the student's learning by training with $10\times$ less data.

Author Information

Rose Wang (Stanford)
Jesse Mu (Stanford University)
Dilip Arumugam (Stanford University)
Natasha Jaques (Google Brain, UC Berkeley)

Natasha Jaques holds a joint position as a Research Scientist at Google Brain and Postdoctoral Fellow at UC Berkeley. Her research focuses on Social Reinforcement Learning in multi-agent and human-AI interactions. Natasha completed her PhD at MIT, where her thesis received the Outstanding PhD Dissertation Award from the Association for the Advancement of Affective Computing. Her work has also received Best Demo at NeurIPS, an honourable mention for Best Paper at ICML, Best of Collection in the IEEE Transactions on Affective Computing, and Best Paper at the NeurIPS workshops on ML for Healthcare and Cooperative AI. She has interned at DeepMind, Google Brain, and was an OpenAI Scholars mentor. Her work has been featured in Science Magazine, Quartz, MIT Technology Review, Boston Magazine, and on CBC radio. Natasha earned her Masters degree from the University of British Columbia, and undergraduate degrees in Computer Science and Psychology from the University of Regina.

Noah Goodman (Stanford University)

More from the Same Authors