Timezone: »
Distributed Differential Privacy in Multi-Armed Bandits
Sayak Ray Chowdhury · Xingyu Zhou
Event URL: https://openreview.net/forum?id=aEOd6wh5GA »
We consider the standard $K$-armed bandit problem under a distributed trust model of differential privacy (DP), which enables to guarantee privacy without a trustworthy server. Under this trust model, previous work largely focus on achieving privacy using a shuffle protocol, where a batch of users data are randomly permuted before sending to a central server. This protocol achieves ($\epsilon,\delta$) or approximate-DP guarantee by sacrificing an additive $O\!\left(\!\frac{K\log T\sqrt{\log(1/\delta)}}{\epsilon}\!\right)\!$ factor in $T$-step cumulative regret. In contrast, the optimal privacy cost to achieve a stronger ($\epsilon,0$) or pure-DP guarantee under the widely used central trust model is only $\Theta\!\left(\!\frac{K\log T}{\epsilon}\!\right)\!$, where, however, a trusted server is required. In this work, we aim to obtain a pure-DP guarantee under distributed trust model while sacrificing no more regret than that under central trust model. We achieve this by designing a generic bandit algorithm based on successive arm elimination, where privacy is guaranteed by corrupting rewards with an equivalent discrete Laplace noise ensured by a secure computation protocol. We numerically simulate regret performance of our algorithm, which corroborate our theoretical findings.
We consider the standard $K$-armed bandit problem under a distributed trust model of differential privacy (DP), which enables to guarantee privacy without a trustworthy server. Under this trust model, previous work largely focus on achieving privacy using a shuffle protocol, where a batch of users data are randomly permuted before sending to a central server. This protocol achieves ($\epsilon,\delta$) or approximate-DP guarantee by sacrificing an additive $O\!\left(\!\frac{K\log T\sqrt{\log(1/\delta)}}{\epsilon}\!\right)\!$ factor in $T$-step cumulative regret. In contrast, the optimal privacy cost to achieve a stronger ($\epsilon,0$) or pure-DP guarantee under the widely used central trust model is only $\Theta\!\left(\!\frac{K\log T}{\epsilon}\!\right)\!$, where, however, a trusted server is required. In this work, we aim to obtain a pure-DP guarantee under distributed trust model while sacrificing no more regret than that under central trust model. We achieve this by designing a generic bandit algorithm based on successive arm elimination, where privacy is guaranteed by corrupting rewards with an equivalent discrete Laplace noise ensured by a secure computation protocol. We numerically simulate regret performance of our algorithm, which corroborate our theoretical findings.
Author Information
Sayak Ray Chowdhury (Indian Institute of Science)
Xingyu Zhou (Wayne State University)
More from the Same Authors
-
2022 Poster: On Kernelized Multi-Armed Bandits with Constraints »
Xingyu Zhou · Bo Ji -
2022 Poster: Provably Efficient Model-Free Constrained RL with Linear Function Approximation »
Arnob Ghosh · Xingyu Zhou · Ness Shroff -
2019 Poster: Bayesian Optimization under Heavy-tailed Payoffs »
Sayak Ray Chowdhury · Aditya Gopalan -
2019 Spotlight: Bayesian Optimization under Heavy-tailed Payoffs »
Sayak Ray Chowdhury · Aditya Gopalan -
2018 : Poster Session »
Lorenzo Masoero · Tammo Rukat · Runjing Liu · Sayak Ray Chowdhury · Daniel Coelho de Castro · Claudia Wehrhahn · Feras Saad · Archit Verma · Kelvin Hsu · Irineo Cabreros · Sandhya Prabhakaran · Yiming Sun · Maxime Rischard · Linfeng Liu · Adam Farooq · Jeremiah Liu · Melanie F. Pradier · Diego Romeres · Neill Campbell · Kai Xu · Mehmet M Dundar · Tucker Keuter · Prashnna Gyawali · Eli Sennesh · Alessandro De Palma · Daniel Flam-Shepherd · Takatomi Kubo