Timezone: »

Informative rewards and generalization in curriculum learning
Rahul Siripurapu · Vihang Patil · Kajetan Schweighofer · Marius-Constantin Dinu · Markus Holzleitner · Hamid Eghbalzadeh · Luis Ferro · Thomas Schmied · Michael Kopp · Sepp Hochreiter

Curriculum learning (CL) is central to human learning as much as reinforcement learning (RL) itself. However, CL agents trained using RL with function approximation produce limited generalization to later tasks in the curriculum. One contributing factor might be exploration itself. Exploration often induces the agent to visit task-irrelevant states, leading to training-induced non-stationarities. Thus, the value/policy networks utilize their limited capacity to fit targets for these irrelevant states. Consequently, this results in impaired generalization to later tasks. First, we propose to use an \emph{online} distillation method to alleviate this problem in CL. We show that one can use a learned, informative reward function to minimize exploration and, consequently, non-stationarities during the distillation process. Second, we show that minimizing exploration improves capacity utilization as measured by feature rank. Finally, we illuminate the links between exploration, non-stationarity, capacity, and generalization in the CL setting. In conclusion, we see this as a crucial step toward improving the generalization of Deep RL methods in Curriculum learning.