Timezone: »
Poster
Hypothesis Selection with Memory Constraints
Maryam Aliakbarpour · Mark Bun · Adam Smith
Hypothesis selection is a fundamental problem in learning theory and statistics. Given a dataset and a finite set of candidate distributions, the goal is to select a distribution that matches the data as well as possible. More specifically, suppose we have sample access to an unknown distribution $P$ over a domain $\mathcal{X}$ that we know is well-approximated by one of a a class of $n$ distributions (a.k.a. hypotheses), $\mathcal{H} \coloneqq \{H_1, H_2, \ldots, H_n\}$. The goal is to design an algorithm that outputs a distribution $\hat{H} \in \mathcal{H}$ whose total variation distance from $P$ is nearly minimal.In this work, we study the hypothesis selection problem under memory constraints. We consider a model where samples from $P$ are presented in a stream and we access each sample $x$ via ``PDF-comparison'' queries that allow us to compare the probability densities of any pair of hypothesesat the domain point $x$ (i.e., is $H_i(x) < H_j(x)$?). This model allows us to study how much memory is needed at any point in time to store information about the portion of the stream seen so far.Our main result is an algorithm that achieves a nearly optimal tradeoff between memory usage and the number of samples required. In particular, given $b$ bits of memory (for $b$ roughly between $\log n$ and $n$), our algorithm solves the hypothesis selection problem with $s$ samples, where $b \cdot s = O(n \log n)$. This result is optimal up to an $O(\log n)$ factor, for all $b$.
Author Information
Maryam Aliakbarpour (Rice University)
Mark Bun (Boston University)
Adam Smith (Boston University)
More from the Same Authors
-
2021 Spotlight: Covariance-Aware Private Mean Estimation Without Private Covariance Estimation »
Gavin Brown · Marco Gaboardi · Adam Smith · Jonathan Ullman · Lydia Zakynthinou -
2021 Spotlight: Differentially Private Model Personalization »
Prateek Jain · John Rush · Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2023 Poster: Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation »
Palak Jain · Iden Kalemaj · Sofya Raskhodnikova · Satchit Sivakumar · Adam Smith -
2022 Poster: Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams »
Sergey Denisov · H. Brendan McMahan · John Rush · Adam Smith · Abhradeep Guha Thakurta -
2022 Poster: Estimation of Entropy in Constant Space with Improved Sample Complexity »
Maryam Aliakbarpour · Andrew McGregor · Jelani Nelson · Erik Waingarten -
2021 Poster: Differentially Private Model Personalization »
Prateek Jain · John Rush · Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 Poster: Differentially Private Sampling from Distributions »
Sofya Raskhodnikova · Satchit Sivakumar · Adam Smith · Marika Swanberg -
2021 Poster: Multiclass versus Binary Differentially Private PAC Learning »
Satchit Sivakumar · Mark Bun · Marco Gaboardi -
2021 Poster: Covariance-Aware Private Mean Estimation Without Private Covariance Estimation »
Gavin Brown · Marco Gaboardi · Adam Smith · Jonathan Ullman · Lydia Zakynthinou -
2020 Poster: A Computational Separation between Private Learning and Online Learning »
Mark Bun -
2018 : Invited talk 4: Models for private data analysis of distributed data »
Adam Smith -
2018 Poster: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Spotlight: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro