Timezone: »
Oral
The Price of Bandit Information for Online Optimization
Varsha Dani · Thomas P Hayes · Sham M Kakade
In the online linear optimization problem, a learner must choose, in each round, a decision from a set $D \subset \R^n$ in order to minimize an (unknown and changing) linear cost function. We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and in the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the \emph{price of bandit information} --- how much worse the regret is in the bandit case as compared to the full information case. For the full information case, the upper bound on the regret is $O^*(\sqrt{nT})$, where $n$ is the ambient dimension and $T$ is the time horizon. For the bandit case, we present an algorithm which achieves $O^*(n^{3/2}\sqrt{T})$ regret --- all previous (nontrivial) bounds here were $O(\textrm{poly}(n)T^{2/3})$ or worse. It is striking that the convergence rate for the bandit setting is only a factor of $n$ worse than in the full information case --- in stark contrast to the $K$-arm bandit setting, where the gap in the dependence on $K$ is exponential ($\sqrt{TK}$ vs. $\sqrt{T\log K}$). We also present lower bounds showing that this gap is at least $\sqrt{n}$, which we conjecture to be the correct order. The bandit algorithm we present can be implemented efficiently in special cases of particular interest, such as path planning and Markov Decision Problems.
Author Information
Varsha Dani (University of Chicago)
Thomas P Hayes (Toyota Technological Institute at Chicago)
Sham M Kakade (Harvard University & Amazon)
Related Events (a corresponding poster, oral, or spotlight)
-
2007 Poster: The Price of Bandit Information for Online Optimization »
Tue. Dec 4th 06:30 -- 06:40 PM Room
More from the Same Authors
-
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning Q&A »
Sham M Kakade · Martha White · Nicolas Le Roux -
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning »
Sham M Kakade · Martha White · Nicolas Le Roux -
2013 Poster: When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity »
Anima Anandkumar · Daniel Hsu · Majid Janzamin · Sham M Kakade -
2012 Poster: Learning Mixtures of Tree Graphical Models »
Anima Anandkumar · Daniel Hsu · Furong Huang · Sham M Kakade -
2012 Poster: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu -
2012 Spotlight: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2011 Poster: Stochastic convex optimization with bandit feedback »
Alekh Agarwal · Dean P Foster · Daniel Hsu · Sham M Kakade · Sasha Rakhlin -
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang -
2011 Poster: Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression »
Sham M Kakade · Adam Kalai · Varun Kanade · Ohad Shamir -
2010 Spotlight: Learning from Logged Implicit Exploration Data »
Alex Strehl · Lihong Li · John Langford · Sham M Kakade -
2010 Poster: Learning from Logged Implicit Exploration Data »
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade -
2009 Poster: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2009 Oral: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2008 Poster: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization »
Sham M Kakade · Karthik Sridharan · Ambuj Tewari