Timezone: »
In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision maker only after some delay, which is unknown and stochastic, even though a decision must be made at each time step for an incoming set of contexts. We study the performance of upper confidence bound (UCB) based algorithms adapted to this delayed setting. In particular, we design a delay-adaptive algorithm, which we call Delayed UCB, for generalized linear contextual bandits using UCB-style exploration and establish regret bounds under various delay assumptions. In the important special case of linear contextual bandits, we further modify this algorithm and establish a tighter regret bound under the same delay assumptions. Our results contribute to the broad landscape of contextual bandits literature by establishing that UCB algorithms, which are widely deployed in modern recommendation engines, can be made robust to delays.
Author Information
Zhengyuan Zhou (Stanford University)
Renyuan Xu (University of Oxford)
Renyuan Xu is currently a Hooke Research Fellow in the Mathematical Institute at the University of Oxford. Prior to that, she obtained her Ph.D. degree in the IEOR Department at UC Berkeley in 2019 and the Bachelor's degree in Mathematics from the University of Science and Technology of China in 2014.
Jose Blanchet (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Spotlight: Learning in Generalized Linear Contextual Bandits with Stochastic Delays »
Thu. Dec 12th 12:45 -- 12:50 AM Room West Exhibition Hall A
More from the Same Authors
-
2022 : Minimax Optimal Kernel Operator Learning via Multilevel Training »
Jikai Jin · Yiping Lu · Jose Blanchet · Lexing Ying -
2022 : Synthetic Principle Component Design: Fast Covariate Balancing with Synthetic Controls »
Yiping Lu · Jiajin Li · Lexing Ying · Jose Blanchet -
2022 Poster: Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent »
Yiping Lu · Jose Blanchet · Lexing Ying -
2022 Poster: Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints »
Jiajin Li · Sirui Lin · Jose Blanchet · Viet Anh Nguyen -
2021 : Statistical Numerical PDE : Fast Rate, Neural Scaling Law and When it’s Optimal »
Yiping Lu · Haoxuan Chen · Jianfeng Lu · Lexing Ying · Jose Blanchet -
2021 Poster: Adversarial Regression with Doubly Non-negative Weighting Matrices »
Tam Le · Truyen Nguyen · Makoto Yamada · Jose Blanchet · Viet Anh Nguyen -
2021 Poster: Modified Frank Wolfe in Probability Space »
Carson Kent · Jiajin Li · Jose Blanchet · Peter W Glynn -
2020 Poster: Distributionally Robust Parametric Maximum Likelihood Estimation »
Viet Anh Nguyen · Xuhui Zhang · Jose Blanchet · Angelos Georghiou -
2020 Poster: Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities »
Chaobing Song · Zhengyuan Zhou · Yichao Zhou · Yong Jiang · Yi Ma -
2020 Poster: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality »
Nian Si · Jose Blanchet · Soumyadip Ghosh · Mark Squillante -
2020 Spotlight: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality »
Nian Si · Jose Blanchet · Soumyadip Ghosh · Mark Squillante -
2020 Poster: Distributionally Robust Local Non-parametric Conditional Estimation »
Viet Anh Nguyen · Fan Zhang · Jose Blanchet · Erick Delage · Yinyu Ye -
2019 Poster: Learning Mean-Field Games »
Xin Guo · Anran Hu · Renyuan Xu · Junzi Zhang -
2019 Poster: Online EXP3 Learning in Adversarial Bandits with Delayed Feedback »
Ilai Bistritz · Zhengyuan Zhou · Xi Chen · Nicholas Bambos · Jose Blanchet -
2019 Poster: Multivariate Distributionally Robust Convex Regression under Absolute Error Loss »
Jose Blanchet · Peter W Glynn · Jun Yan · Zhengqing Zhou -
2019 Poster: Semi-Parametric Dynamic Contextual Pricing »
Virag Shah · Ramesh Johari · Jose Blanchet -
2018 Poster: Learning in Games with Lossy Feedback »
Zhengyuan Zhou · Panayotis Mertikopoulos · Susan Athey · Nicholas Bambos · Peter W Glynn · Yinyu Ye -
2017 Poster: Countering Feedback Delays in Multi-Agent Learning »
Zhengyuan Zhou · Panayotis Mertikopoulos · Nicholas Bambos · Peter W Glynn · Claire Tomlin -
2017 Poster: Stochastic Mirror Descent in Variationally Coherent Optimization Problems »
Zhengyuan Zhou · Panayotis Mertikopoulos · Nicholas Bambos · Stephen Boyd · Peter W Glynn