Tue Dec 5th 04:50 -- 05:05 PM @ Hall C
Communication-Efficient Distributed Learning of Discrete Distributions
We initiate a systematic study of distribution learning (or density estimation) in the distributed model. In this problem the data drawn from an unknown distribution is partitioned across multiple machines. The machines must succinctly communicate with a referee so that in the end the referee can estimate the underlying distribution of the data. The problem is motivated by the pressing need to build communication-efficient protocols in various distributed systems, where power consumption or limited bandwidth impose stringent communication constraints. We give the first upper and lower bounds on the communication complexity of nonparametric density estimation of discrete probability distributions under both l1 and the l2 distances. Specifically, our results include the following: 1. In the case when the unknown distribution is arbitrary and each machine has only one sample, we show that any interactive protocol that learns the distribution must essentially communicate the entire sample. 2. In the case of structured distributions, such as $k$-histograms and monotone, we design distributed protocols that achieve better communication guarantees than the trivial ones, and show tight bounds in some regimes.