Timezone: »

Revisiting Bellman Errors for Offline Model Selection
Joshua Zitovsky · Rishabh Agarwal · Daniel de Marchi · Michael Kosorok
Event URL: https://openreview.net/forum?id=bmB3nlZbQd »
Applying offline reinforcement learning in real-world settings necessitates the ability to tune hyperparameters offline, a task known as $\textit{offline model selection}$. It is well-known that the empirical Bellman errors are poor predictors of value function estimation accuracy and policy performance. This has led researchers to abandon model selection procedures based on Bellman errors and instead focus on evaluating the expected return under policies of interest. The problem with this approach is that it can be very difficult to use an offline dataset generated by one policy to estimate the expected returns of a different policy. In contrast, we argue that Bellman errors can be useful for offline model selection, and that the discouraging results in past literature has been due to estimating and utilizing them incorrectly. We propose a new algorithm, $\textit{Supervised Bellman Validation}$, that estimates the expected squared Bellman error better than the empirical Bellman errors. We demonstrate the relative merits of our method over competing methods through both theoretical results and empirical results on datasets from the Atari benchmark. We hope that our results will challenge current attitudes and spur future research into Bellman errors and their utility in offline model selection.

Author Information

Joshua Zitovsky (University of North Carolina at Chapel Hill)
Rishabh Agarwal (Google Research, Brain Team)

My research work mainly revolves around deep reinforcement learning (RL), often with the goal of making RL methods suitable for real-world problems, and includes an outstanding paper award at NeurIPS.

Daniel de Marchi (Gillings School of Public Health, Dept. of Biostatistics)
Michael Kosorok (University of North Carolina at Chapel Hill)
Michael Kosorok

Michael R. Kosorok, PhD, is the W.R. Kenan, Jr. Distinguished Professor of Biostatistics and Professor of Statistics and Operations Research at the University of North Carolina at Chapel Hill. Research interests include reinforcement learning, precision medicine, and decision support.

More from the Same Authors