Timezone: »

 
Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay
Iryna Korshunova · Minqi Jiang · Jack Parker-Holder · Tim Rocktäschel · Edward Grefenstette
Event URL: https://openreview.net/forum?id=B6VyCbPVyPb »

Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels. PLR selectively samples training levels weighed by a function of recent temporal-difference (TD) errors experienced on each level. We explore the dispersion of returns as an alternative prioritization criterion to address certain issues with TD error scores.

Author Information

Iryna Korshunova (Ghent University)
Minqi Jiang (UCL & FAIR)
Jack Parker-Holder (University of Oxford)
Tim Rocktäschel (Facebook AI Research)
Edward Grefenstette (Facebook AI Research & University College London)

More from the Same Authors