NeurIPS Poster A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Poster

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Chloé Rouyer · Dirk van der Hoeven · Nicolò Cesa-Bianchi · Yevgeny Seldin

Hall J (level 1) #839

Keywords: [ Beyond Bandits ] [ Online Learning ] [ feedback graphs ]

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally-efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is

~ O (\sqrt{α T})

$\tilde{O} (\sqrt{\alpha T})$ , where

T

$T$ is the time horizon and

α

$\alpha$ is the independence number of the feedback graph. The bound against stochastic environments is

O ((ln T)^{2} {max}_{S \in I (G)} \sum_{i \in S} Δ_{i}^{- 1})

$O\big((\ln T)^2 \max_{S\in \mathcal I(G)} \sum_{i \in S} \Delta_i^{-1}\big)$ where

I (G)

$\mathcal I(G)$ is the family of all independent sets in a suitably defined undirected version of the graph and

Δ_{i}

$\Delta_i$ are the suboptimality gaps.The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.

Chat is not available.