Timezone: »

Informative rewards and generalization in curriculum learning
Rahul Siripurapu · Vihang Patil · Kajetan Schweighofer · Marius-Constantin Dinu · Markus Holzleitner · Hamid Eghbalzadeh · Luis Ferro · Thomas Schmied · Michael Kopp · Sepp Hochreiter
Event URL: https://openreview.net/forum?id=9CvMkA8oi8O »

Curriculum learning (CL) is central to human learning as much as reinforcement learning (RL) itself. However, CL agents trained using RL with function approximation produce limited generalization to later tasks in the curriculum. One contributing factor might be exploration itself. Exploration often induces the agent to visit task-irrelevant states, leading to training-induced non-stationarities. Thus, the value/policy networks utilize their limited capacity to fit targets for these irrelevant states. Consequently, this results in impaired generalization to later tasks. First, we propose to use an \emph{online} distillation method to alleviate this problem in CL. We show that one can use a learned, informative reward function to minimize exploration and, consequently, non-stationarities during the distillation process. Second, we show that minimizing exploration improves capacity utilization as measured by feature rank. Finally, we illuminate the links between exploration, non-stationarity, capacity, and generalization in the CL setting. In conclusion, we see this as a crucial step toward improving the generalization of Deep RL methods in Curriculum learning.

Author Information

Rahul Siripurapu (IARAI)
Vihang Patil (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria)
Kajetan Schweighofer (Johannes Kepler Universität Linz)
Marius-Constantin Dinu (Johannes Kepler University Linz)
Markus Holzleitner (Ellis Unit / University Linz)
Hamid Eghbalzadeh (Meta)
Luis Ferro (Institute of Advanced Research in Artificial Intelligence)
Thomas Schmied (ELLIS Unit / University Linz)
Michael Kopp (Institute of Advanced Research in Artificial Intelligence (IARAI) GmbH)
Sepp Hochreiter (ELLIS Unit / University Linz)

More from the Same Authors