Skip to yearly menu bar Skip to main content


Poster

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Chloé Rouyer · Dirk van der Hoeven · Nicolò Cesa-Bianchi · Yevgeny Seldin

Hall J (level 1) #839

Keywords: [ Beyond Bandits ] [ Online Learning ] [ feedback graphs ]


Abstract: We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally-efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is ˜O(αT)~O(αT), where TT is the time horizon and αα is the independence number of the feedback graph. The bound against stochastic environments is O((lnT)2maxSI(G)iSΔ1i)O((lnT)2maxSI(G)iSΔ1i) where I(G)I(G) is the family of all independent sets in a suitably defined undirected version of the graph and ΔiΔi are the suboptimality gaps.The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.

Chat is not available.