Timezone: »
Multi-armed bandit algorithms minimize experimentation costs required to converge on optimal behavior. They do so by rapidly adapting experimentation effort away from poorly performing actions as feedback is observed. But this desirable feature makes them sensitive to confounding. We highlight, for instance, that popular bandit algorithms cannot address the problem of identifying the best action when day-of-week effects may confound inferences. In response, this paper formulates a general model of contextual bandit experiments with nonstationary contexts, which act as the confounders for inferences and can be also viewed as the distribution shifts in the earlier periods of the experiments. In addition, this general model allows the target distribution or population distribution that is used to determine the best action to be different from the empirical distribution over the contexts observed during the experiments. The paper proposes deconfounded Thompson sampling, which makes simple, but critical, modifications to the way Thompson sampling is usually applied. Theoretical guarantees suggest the algorithm strikes a delicate balance between adaptivity and robustness to confounding and distribution shifts. It attains asymptotic lower bounds on the number of samples required to confidently identify the best action --- suggesting optimal adaptivity --- but also satisfies strong performance guarantees in the presence of day-of-week effects and delayed observations --- suggesting unusual robustness.
Author Information
Chao Qin (Columbia University)
Daniel Russo (Columbia University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : On Adaptivity and Confounding in Contextual Bandit Experiments »
Mon. Dec 13th 08:40 -- 08:50 PM Room
More from the Same Authors
-
2021 : On Adaptivity and Confounding in Contextual Bandit Experiments »
Chao Qin · Daniel Russo -
2022 Spotlight: An Analysis of Ensemble Sampling »
Chao Qin · Zheng Wen · Xiuyuan Lu · Benjamin Van Roy -
2022 Poster: An Analysis of Ensemble Sampling »
Chao Qin · Zheng Wen · Xiuyuan Lu · Benjamin Van Roy -
2022 Poster: Temporally-Consistent Survival Analysis »
Lucas Maystre · Daniel Russo -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Worst-Case Regret Bounds for Exploration via Randomized Value Functions »
Daniel Russo -
2017 Poster: Improving the Expected Improvement Algorithm »
Chao Qin · Diego Klabjan · Daniel Russo -
2014 Poster: Learning to Optimize via Information-Directed Sampling »
Daniel Russo · Benjamin Van Roy -
2013 Poster: (More) Efficient Reinforcement Learning via Posterior Sampling »
Ian Osband · Daniel Russo · Benjamin Van Roy -
2013 Poster: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Oral: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy