Timezone: »
Poster
Dueling Bandits with Adversarial Sleeping
Aadirupa Saha · Pierre Gaillard
We introduce the problem of sleeping dueling bandits with stochastic preferences and adversarial availabilities (DB-SPAA). In almost all dueling bandit applications, the decision space often changes over time; eg, retail store management, online shopping, restaurant recommendation, search engine optimization, etc. Surprisingly, this `sleeping aspect' of dueling bandits has never been studied in the literature. Like dueling bandits, the goal is to compete with the best arm by sequentially querying the preference feedback of item pairs. The non-triviality however results due to the non-stationary item spaces that allow any arbitrary subsets items to go unavailable every round. The goal is to find an optimal `no-regret policy that can identify the best available item at each round, as opposed to the standard `fixed best-arm regret objective' of dueling bandits. We first derive an instance-specific lower bound for DB-SPAA $\Omega( \sum_{i =1}^{K-1}\sum_{j=i+1}^K \frac{\log T}{\Delta(i,j)})$, where $K$ is the number of items and $\Delta(i,j)$ is the gap between items $i$ and $j$. This indicates that the sleeping problem with preference feedback is inherently more difficult than that for classical multi-armed bandits (MAB). We then propose two algorithms, with near optimal regret guarantees. Our results are corroborated empirically.
Author Information
Aadirupa Saha (Microsoft Research)
Aadirupa Saha is a PhD student at the department of Computer Science and Automation (CSA), Indian Institute of Science (IISc), Bangalore and was a research intern at Google, Mountain View, CA (June-Sept, 2019). Her research interests broadly lie in the areas of Machine Learning, Statistical Learning Theory and Optimization. Her current research specifically focuses on decision making under uncertainty from sequential data, reinforcement learning, and preference based rank aggregation problems.
Pierre Gaillard (Inria)
More from the Same Authors
-
2021 Spotlight: Mixability made efficient: Fast online multiclass logistic regression »
Rémi Jézéquel · Pierre Gaillard · Alessandro Rudi -
2022 : Distributed Online and Bandit Convex Optimization »
Kumar Kshitij Patel · Aadirupa Saha · Nati Srebro · Lingxiao Wang -
2023 Poster: Provably Efficient Personalized Multi-Objective Decision Making via Comparative Feedback »
Han Shao · Lee Cohen · Avrim Blum · Yishay Mansour · Aadirupa Saha · Matthew Walter -
2021 Poster: Mixability made efficient: Fast online multiclass logistic regression »
Rémi Jézéquel · Pierre Gaillard · Alessandro Rudi -
2021 Oral: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2021 Poster: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2021 Poster: Optimal Algorithms for Stochastic Contextual Preference Bandits »
Aadirupa Saha -
2020 Poster: Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model »
Raphaël Berthier · Francis Bach · Pierre Gaillard -
2019 Poster: Combinatorial Bandits with Relative Feedback »
Aadirupa Saha · Aditya Gopalan -
2019 Poster: Efficient online learning with kernels for adversarial large scale problems »
Rémi Jézéquel · Pierre Gaillard · Alessandro Rudi -
2018 Poster: Efficient online algorithms for fast-rate regret bounds under sparsity »
Pierre Gaillard · Olivier Wintenberger