Timezone: »

Combinatorial Bandits with Relative Feedback
Aadirupa Saha · Aditya Gopalan

Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #18
We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute. Specifically, we study two regret minimisation problems over subsets of a finite ground set $[n]$, with subset-wise relative preference information feedback according to the Multinomial logit choice model. In the first setting, the learner can play subsets of size bounded by a maximum size and receives top-$m$ rank-ordered feedback, while in the second setting the learner can play subsets of a fixed size $k$ with a full subset ranking observed as feedback. For both settings, we devise instance-dependent and order-optimal regret algorithms with regret $O(\frac{n}{m} \ln T)$ and $O(\frac{n}{k} \ln T)$, respectively. We derive fundamental limits on the regret performance of online learning with subset-wise preferences, proving the tightness of our regret guarantees. Our results also show the value of eliciting more general top-$m$ rank-ordered feedback over single winner feedback ($m=1$). Our theoretical results are corroborated with empirical evaluations.

Author Information

Aadirupa Saha (Indian Institute of Science)

Aadirupa Saha is a PhD student at the department of Computer Science and Automation (CSA), Indian Institute of Science (IISc), Bangalore and was a research intern at Google, Mountain View, CA (June-Sept, 2019). Her research interests broadly lie in the areas of Machine Learning, Statistical Learning Theory and Optimization. Her current research specifically focuses on decision making under uncertainty from sequential data, reinforcement learning, and preference based rank aggregation problems.

Aditya Gopalan (Indian Institute of Science)

More from the Same Authors

  • 2022 : Distributed Online and Bandit Convex Optimization »
    Kumar Kshitij Patel · Aadirupa Saha · Nati Srebro · Lingxiao Wang
  • 2021 Poster: Dueling Bandits with Adversarial Sleeping »
    Aadirupa Saha · Pierre Gaillard
  • 2021 Poster: Optimal Algorithms for Stochastic Contextual Preference Bandits »
    Aadirupa Saha
  • 2019 : Coffee break, posters, and 1-on-1 discussions »
    Julius von Kügelgen · David Rohde · Candice Schumann · Grace Charles · Victor Veitch · Vira Semenova · Mert Demirer · Vasilis Syrgkanis · Suraj Nair · Aahlad Puli · Masatoshi Uehara · Aditya Gopalan · Yi Ding · Ignavier Ng · Khashayar Khosravi · Eli Sherman · Shuxi Zeng · Aleksander Wieczorek · Hao Liu · Kyra Gan · Jason Hartford · Miruna Oprescu · Alexander D'Amour · Jörn Boehnke · Yuta Saito · Théophile Griveau-Billion · Chirag Modi · Shyngys Karimov · Jeroen Berrevoets · Logan Graham · Imke Mayer · Dhanya Sridhar · Issa Dahabreh · Alan Mishler · Duncan Wadsworth · Khizar Qureshi · Rahul Ladhania · Gota Morishita · Paul Welle
  • 2019 Poster: Bayesian Optimization under Heavy-tailed Payoffs »
    Sayak Ray Chowdhury · Aditya Gopalan
  • 2019 Spotlight: Bayesian Optimization under Heavy-tailed Payoffs »
    Sayak Ray Chowdhury · Aditya Gopalan
  • 2018 : Spotlights 2 »
    Aditya Gopalan · Sungjoon Choi · Thomas Ringstrom · Roy Fox · Jonas Degrave · Xiya Cao · Karl Pertsch · Maximilian Igl · Brian Ichter
  • 2018 : Poster Session 1 »
    Kyle H Ambert · Brandon Araki · Xiya Cao · Sungjoon Choi · Hao(Jackson) Cui · Jonas Degrave · Yaqi Duan · Mattie Fellows · Carlos Florensa · Karan Goel · Aditya Gopalan · Ming-Xu Huang · Jonathan Hunt · Cyril Ibrahim · Brian Ichter · Maximilian Igl · Zheng Tracy Ke · Igor Kiselev · Anuj Mahajan · Arash Mehrjou · Karl Pertsch · Alexandre Piche · Nicholas Rhinehart · Thomas Ringstrom · Reazul Hasan Russel · Oleh Rybkin · Ion Stoica · Sharad Vikram · Angelina Wang · Ting-Han Wei · Abigail H Wen · I-Chen Wu · Zhengwei Wu · Linhai Xie · Dinghan Shen