Timezone: »
We study the problem of discrete distribution testing in the twoparty setting. For example, in the standard closeness testing problem, Alice and Bob each have t samples from, respectively, distributions a and b over [n], and they need to test whether a=b or a,b are εfar (in the ℓ1 distance) for some fixed ε>0. This is in contrast to the wellstudied oneparty case, where the tester has unrestricted access to samples of both distributions, for which optimal bounds are known for a number of variations. Despite being a natural constraint in applications, the twoparty setting has evaded attention so far.
We address two fundamental aspects of the twoparty setting: 1) what is the communication complexity, and 2) can it be accomplished securely, without Alice and Bob learning extra information about each other's input. Besides closeness testing, we also study the independence testing problem, where Alice and Bob have t samples from distributions a and b respectively, which may be correlated; the question is whether a,b are independent of εfar from being independent.
Our contribution is threefold:
∙ Communication: we show how to gain communication efficiency as we have more samples, beyond the informationtheoretic bound on t. Furthermore, the gain is polynomially better than what one may obtain by adapting oneparty algorithms.
For the closeness testing, our protocol has communication s=Θε(n^2/t^2) as long as t is at least the informationtheoretic minimum number of samples. For the independence testing over domain [n]×[m], where n≥m, we obtain s=Oε(n^2m/t^2+nm/t+m^1/2).
∙ Security: we define the concept of secure distribution testing and argue that it must leak at least some minimal information when the promise is not satisfied. We then provide secure versions of the above protocols with an overhead that is only polynomial in the security parameter.
∙ Lower bounds: we prove tightness of our tradeoff for the closeness testing, as well as that the independence testing requires tight Ω(m^1/2) communication for unbounded number of samples. These lower bounds are of independent interest as, to the best of our knowledge, these are the first 2party communication lower bounds for testing problems, where the inputs represent a set of i.i.d. samples.
Author Information
Negev Shekel Nosatzki (Columbia University)
More from the Same Authors

2018 : Poster Session »
Phillipp Schoppmann · Patrick Yu · Valerie Chen · Travis Dick · Marc Joye · Ningshan Zhang · Frederik Harder · Olli Saarikivi · Théo Ryffel · Yunhui Long · Théo JOURDAN · Di Wang · Antonio Marcedone · Negev Shekel Nosatzki · Yatharth A Dubey · Antti Koskela · Peter Bloem · Aleksandra Korolova · Martin Bertran · Hao Chen · Galen Andrew · Natalia Martinez · Janardhan Kulkarni · Jonathan PasseratPalmbach · Guillermo Sapiro · Amrita Roy Chowdhury