Timezone: »

A Closer Look at Gradient Estimators with Reinforcement Learning as Inference
Jonathan Lavington · Michael Teng · Mark Schmidt · Frank Wood
Event URL: https://openreview.net/forum?id=bR0K-nz1-6p »

The concept of reinforcement learning as inference (RLAI) has led to the creation of a variety of popular algorithms in deep reinforcement learning. Unfortunately, most research in this area relies on wider algorithmic innovations not necessarily relevant to such frameworks. Additionally, many seemingly unimportant modifications made to these algorithms, actually produce inconsistencies with the original inference problem posed by RLAI. Taking a divergence minimization perspective, this work considers some of the practical merits and theoretical issues created by the choice of loss function minimized in the policy update in off-policy reinforcement learning. Our results show that while the choice of divergence rarely has a major affect on the sample efficiency of the algorithm, it can have important practical repercussions on ease of implementation, computational efficiency, and restrictions to the distribution over actions.

Author Information

Jonathan Lavington (University of British Columbia)
Michael Teng (University of Oxford)
Mark Schmidt (University of British Columbia)
Frank Wood (Columbia University)

More from the Same Authors