Timezone: »

An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control
Nicholas Ioannidis · Jonathan Lavington · Mark Schmidt
Event URL: https://openreview.net/forum?id=KvDedKtOX7B »

Off-policy reinforcement learning (RL) algorithms can take advantage of samples generated from all previous interactions with the environment through "experience replay". Such methods outperform almost all on-policy and model-based alternatives in complex tasks where a structured or well parameterized model of the world does not exist. This makes them desirable for practitioners who lack domain specific knowledge, but who still require high sample efficiency. However this high performance can come at a cost. Because of additional hyperparameters introduced to efficiently learn function approximators, off-policy RL can perform poorly on new problems. To address parameter sensitivity, we show how the correct choice of non-uniform sampling for experience replay can stabilize model performance under varying environmental conditions and hyper-parameters.

Author Information

Nicholas Ioannidis (University of British Columbia)
Jonathan Lavington (University of British Columbia)
Mark Schmidt (University of British Columbia)

More from the Same Authors