Skip to yearly menu bar Skip to main content


Poster

Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

Artin Tajdini · Lalit Jain · Kevin Jamieson

West Ballroom A-D #5802
[ ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract: We consider maximizing an unknown monotonic, submodular set function f:2[n][0,1] with cardinality constraint under stochastic bandit feedback. At each time t=1,,T the learner chooses a set St[n] with |St|k and receives reward f(St)+ηt where ηt is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum f(S) with |S|=k, obtained through robust greedy maximization of f. To date, the best regret bound in the literature scales as kn1/3T2/3. And by trivially treating every set as a unique arm one deduces that (nk)T is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like Ω~(minLk(L1/3n1/3T2/3+(nkL)T)). For a slightly restricted algorithm class, we prove a stronger regret lower bound of Ω~(minLk(Ln1/3T2/3+(nkL)T)). Moreover, we propose an algorithm Sub-UCB that achieves regret O~(minLk(Ln1/3T2/3+(nkL)T)) capable of matching the lower bound on regret for the restricted class up to logarithmic factors.

Chat is not available.