Timezone: »

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning
Gregory Farquhar · Shimon Whiteson · Jakob Foerster

Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #207

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.

Author Information

Gregory Farquhar (University of Oxford)
Shimon Whiteson (University of Oxford)
Jakob Foerster (Facebook AI Research)

Jakob Foerster is a PhD student in AI at the University of Oxford under the supervision of Shimon Whiteson and Nando de Freitas. Using deep reinforcement learning he studies the emergence of communication in multi-agent AI systems. Prior to his PhD Jakob spent four years working at Google and Goldman Sachs. Previously he has also worked on a number of research projects in systems neuroscience, including work at MIT and the Weizmann Institute.

More from the Same Authors