NeurIPS Poster Federated Linear Bandits with Finite Adversarial Actions

Poster

Federated Linear Bandits with Finite Adversarial Actions

Li Fan · Ruida Zhou · Chao Tian · Cong Shen

Great Hall & Hall B1+B2 (level 1) #1904

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Abstract: We study a federated linear bandits model, where

M

$M$ clients communicate with a central server to solve a linear contextual bandits problem with finite adversarial action sets that may be different across clients. To address the unique challenges of **adversarial finite** action sets, we propose the FedSupLinUCB algorithm, which extends the principles of SupLinUCB and OFUL algorithms in linear contextual bandits. We prove that FedSupLinUCB achieves a total regret of

\tilde{O} (\sqrt{d T})

$\tilde{O}(\sqrt{d T})$ , where

T

$T$ is the total number of arm pulls from all clients, and

d

$d$ is the ambient dimension of the linear model. This matches the minimax lower bound and thus is order-optimal (up to polylog terms). We study both asynchronous and synchronous cases and show that the communication cost can be controlled as

O (d M^{2} \log (d) \log (T))

$O(d M^2 \log(d)\log(T))$ and

O (\sqrt{d^{3} M^{3}} \log (d))

$O(\sqrt{d^3 M^3} \log(d))$ , respectively. The FedSupLinUCB design is further extended to two scenarios: (1) variance-adaptive, where a total regret of

\tilde{O} (\sqrt{d \sum_{t = 1}^{T} σ_{t}^{2}})

$\tilde{O} (\sqrt{d \sum \nolimits_{t=1}^{T} \sigma_t^2})$ can be achieved with

σ_{t}^{2}

$\sigma_t^2$ being the noise variance of round

t

$t$ ; and (2) adversarial corruption, where a total regret of

\tilde{O} (\sqrt{d T} + d C_{p})

$\tilde{O}(\sqrt{dT} + d C_p)$ can be achieved with

C_{p}

$C_p$ being the total corruption budget. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of \alg on both synthetic and real-world datasets.

Chat is not available.