NeurIPS Poster Oracle-Efficient Algorithms for Online Linear Optimization with Bandit Feedback

Poster

Oracle-Efficient Algorithms for Online Linear Optimization with Bandit Feedback

Shinji Ito · Daisuke Hatano · Hanna Sumita · Kei Takemura · Takuro Fukunaga · Naonori Kakimura · Ken-Ichi Kawarabayashi

Keywords: [ Bandit Algorithms ] [ Algorithms ] [ Convex Optimization; The ] [ Algorithms; Algorithms -> Online Learning; Optimization -> Combinatorial Optimization; Optimization ]

[ Abstract ]

2019 Poster

Abstract: We propose computationally efficient algorithms for \textit{online linear optimization with bandit feedback}, in which a player chooses an \textit{action vector} from a given (possibly infinite) set

A \subseteq R^{d}

, and then suffers a loss that can be expressed as a linear function in action vectors. Although existing algorithms achieve an optimal regret bound of

\tilde{O} (\sqrt{T})

for

T

rounds (ignoring factors of

poly (d, \log T)

), computationally efficient ways of implementing them have not yet been specified, in particular when

| A |

is not bounded by a polynomial size in

d

. A standard way to pursue computational efficiency is to assume that we have an efficient algorithm referred to as \textit{oracle} that solves (offline) linear optimization problems over

A

. Under this assumption, the computational efficiency of a bandit algorithm can then be measured in terms of \textit{oracle complexity}, i.e., the number of oracle calls. Our contribution is to propose algorithms that offer optimal regret bounds of

\tilde{O} (\sqrt{T})

as well as low oracle complexity for both \textit{non-stochastic settings} and \textit{stochastic settings}. Our algorithm for non-stochastic settings has an oracle complexity of

\tilde{O} (T)

and is the first algorithm that achieves both a regret bound of

\tilde{O} (\sqrt{T})

and an oracle complexity of

\tilde{O} (poly (T))

, given only linear optimization oracles. Our algorithm for stochastic settings calls the oracle only

O (poly (d, \log T))

times, which is smaller than the current best oracle complexity of

O (T)

T

is sufficiently large.

Live content is unavailable. Log in and register to view live content