Skip to yearly menu bar Skip to main content


Poster

Improved Analysis for Bandit Learning in Matching Markets

Fang Kong · Zilong Wang · Shuai Li

West Ballroom A-D #5702
[ ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: A rich line of works study the bandit learning problem in two-sided matching markets, where one side of market participants (players) are uncertain about their preferences and hope to find a stable matching during iterative matchings with the other side (arms). The state-of-the-art analysis shows that the player-optimal stable regret is of order $O(K\log T/\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\Delta$ is the players' minimum preference gap. However, this result may be far from the lower bound $\Omega(\max\\{N\log T/\Delta^2, K\log T/\Delta\\})$ since the number $K$ of arms (workers, publisher slots) may be much larger than that $N$ of players (employers in labor markets, advertisers in online advertising, respectively). In this paper, we propose a new algorithm and show that the regret can be upper bounded by $O(N^2\log T/\Delta^2 + K \log T/\Delta)$. This result removes the dependence on $K$ in the main order term and improves the state-of-the-art guarantee in common cases where $N$ is much smaller than $K$. Such an advantage is also verified in experiments. In addition, we provide an improved $O(N \log T/\Delta^2 + K \log T / \Delta)$ regret for the existing centralized UCB algorithm under $\alpha$-condition, which matches the problem lower bound only with some difference in the gap definition.

Live content is unavailable. Log in and register to view live content