Timezone: »
As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We prove that the bandit arm selection probabilities cannot generally be assumed to concentrate. Non-concentration of the arm selection probabilities makes inference on adaptively-collected data challenging because classical statistical inference approaches, such as using asymptotic normality or the bootstrap, can have inflated Type-1 error and confidence intervals with below-nominal coverage probabilities even asymptotically. In response we develop the Batched Ordinary Least Squares estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward and thus leads to reliable Type-1 error control and accurate confidence intervals.
Author Information
Kelly Zhang (Harvard University)
Lucas Janson (Harvard University)
Susan Murphy (Harvard University)
More from the Same Authors
-
2021 Workshop: Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice »
Aurelien Bibaut · Maria Dimakopoulou · Nathan Kallus · Xinkun Nie · Masatoshi Uehara · Kelly Zhang -
2021 Poster: Statistical Inference with M-Estimators on Adaptively Collected Data »
Kelly Zhang · Lucas Janson · Susan Murphy -
2020 : Invited Talk: Assessing Personalization in Digital Health »
Susan Murphy -
2020 Workshop: Machine Learning for Mobile Health »
Joseph Futoma · Walter Dempsey · Katherine Heller · Yian Ma · Nicholas Foti · Marianne Njifon · Kelly Zhang · Jieru Shi -
2020 Poster: Cross-validation Confidence Intervals for Test Error »
Pierre Bayle · Alexandre Bayle · Lucas Janson · Lester Mackey -
2019 : Susan Murphy »
Susan Murphy