Poster
Optimizing the coalition gain in Online Auctions with Greedy Structured Bandits
Dorian Baudry · Hugo Richard · Maria Cherifa · Vianney Perchet · Clément Calauzènes
West Ballroom A-D #6903
Abstract:
Motivated by online display advertising, this work considers repeated second-price auctions, where agents sample their value from an unknown distribution with cumulative distribution function . In each auction , a decision-maker bound by limited observations selects agents from a coalition of to compete for a prize with other agents, aiming to maximize the cumulative reward of the coalition across all auctions.The problem is framed as an -armed structured bandit, each number of player sent being an arm , with expected reward fully characterized by and . We present two algorithms, Local-Greedy (LG) and Greedy-Grid (GG), both achieving *constant* problem-dependent regret. This relies on three key ingredients: **1.** an estimator of from feedback collected from any arm , **2.** concentration bounds of these estimates for within an estimation neighborhood of and **3.** the unimodality property of under standard assumptions on . Additionally, GG exhibits problem-independent guarantees on top of best problem-dependent guarantees. However, by avoiding to rely on confidence intervals, LG practically outperforms GG, as well as standard unimodal bandit algorithms such as OSUB or multi-armed bandit algorithms.
Chat is not available.