Skip to yearly menu bar Skip to main content


Poster

Optimizing the coalition gain in Online Auctions with Greedy Structured Bandits

Dorian Baudry · Hugo Richard · Maria Cherifa · Vianney Perchet · Clément Calauzènes

West Ballroom A-D #6903
[ ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract: Motivated by online display advertising, this work considers repeated second-price auctions, where agents sample their value from an unknown distribution with cumulative distribution function F. In each auction t, a decision-maker bound by limited observations selects nt agents from a coalition of N to compete for a prize with p other agents, aiming to maximize the cumulative reward of the coalition across all auctions.The problem is framed as an N-armed structured bandit, each number of player sent being an arm n, with expected reward r(n) fully characterized by F and p+n. We present two algorithms, Local-Greedy (LG) and Greedy-Grid (GG), both achieving *constant* problem-dependent regret. This relies on three key ingredients: **1.** an estimator of r(n) from feedback collected from any arm k, **2.** concentration bounds of these estimates for k within an estimation neighborhood of n and **3.** the unimodality property of r under standard assumptions on F. Additionally, GG exhibits problem-independent guarantees on top of best problem-dependent guarantees. However, by avoiding to rely on confidence intervals, LG practically outperforms GG, as well as standard unimodal bandit algorithms such as OSUB or multi-armed bandit algorithms.

Chat is not available.