Poster
Oracle-Efficient Algorithms for Online Linear Optimization with Bandit Feedback
Shinji Ito · Daisuke Hatano · Hanna Sumita · Kei Takemura · Takuro Fukunaga · Naonori Kakimura · Ken-Ichi Kawarabayashi
Keywords: [ Bandit Algorithms ] [ Algorithms ] [ Convex Optimization; The ] [ Algorithms; Algorithms -> Online Learning; Optimization -> Combinatorial Optimization; Optimization ]
[
Abstract
]
2019 Poster
Abstract:
We propose computationally efficient algorithms for \textit{online linear optimization with bandit feedback}, in which a player chooses an \textit{action vector} from a given (possibly infinite) set , and then suffers a loss that can be expressed as a linear function in action vectors. Although existing algorithms achieve an optimal regret bound of for rounds (ignoring factors of ), computationally efficient ways of implementing them have not yet been specified, in particular when is not bounded by a polynomial size in . A standard way to pursue computational efficiency is to assume that we have an efficient algorithm referred to as \textit{oracle} that solves (offline) linear optimization problems over . Under this assumption, the computational efficiency of a bandit algorithm can then be measured in terms of \textit{oracle complexity}, i.e., the number of oracle calls. Our contribution is to propose algorithms that offer optimal regret bounds of as well as low oracle complexity for both \textit{non-stochastic settings} and \textit{stochastic settings}. Our algorithm for non-stochastic settings has an oracle complexity of and is the first algorithm that achieves both a regret bound of and an oracle complexity of , given only linear optimization oracles. Our algorithm for stochastic settings calls the oracle only times, which is smaller than the current best oracle complexity of if is sufficiently large.
Live content is unavailable. Log in and register to view live content