Timezone: »
Poster
Learning Populations of Parameters
Kevin Tian · Weihao Kong · Gregory Valiant
Consider the following estimation problem: there are $n$ entities, each with an unknown parameter $p_i \in [0,1]$, and we observe $n$ independent random variables, $X_1,\ldots,X_n$, with $X_i \sim $ Binomial$(t, p_i)$. How accurately can one recover the ``histogram'' (i.e. cumulative density function) of the $p_i$'s? While the empirical estimates would recover the histogram to earth mover distance $\Theta(\frac{1}{\sqrt{t}})$ (equivalently, $\ell_1$ distance between the CDFs), we show that, provided $n$ is sufficiently large, we can achieve error $O(\frac{1}{t})$ which is information theoretically optimal. We also extend our results to the multi-dimensional parameter case, capturing settings where each member of the population has multiple associated parameters. Beyond the theoretical results, we demonstrate that the recovery algorithm performs well in practice on a variety of datasets, providing illuminating insights into several domains, including politics, sports analytics, and variation in the gender ratio of offspring.
Author Information
Kevin Tian (Stanford University)
Weihao Kong (Stanford University)
Gregory Valiant (Stanford University)
More from the Same Authors
-
2021 Spotlight: List-Decodable Mean Estimation in Nearly-PCA Time »
Ilias Diakonikolas · Daniel Kane · Daniel Kongsgaard · Jerry Li · Kevin Tian -
2021 Poster: Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions »
Yin Tat Lee · Ruoqi Shen · Kevin Tian -
2021 Poster: Robust Regression Revisited: Acceleration and Improved Estimation Rates »
Arun Jambulapati · Jerry Li · Tselil Schramm · Kevin Tian -
2021 Poster: List-Decodable Mean Estimation in Nearly-PCA Time »
Ilias Diakonikolas · Daniel Kane · Daniel Kongsgaard · Jerry Li · Kevin Tian -
2021 Oral: Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions »
Yin Tat Lee · Ruoqi Shen · Kevin Tian -
2020 Poster: Acceleration with a Ball Optimization Oracle »
Yair Carmon · Arun Jambulapati · Qijia Jiang · Yujia Jin · Yin Tat Lee · Aaron Sidford · Kevin Tian -
2020 Oral: Acceleration with a Ball Optimization Oracle »
Yair Carmon · Arun Jambulapati · Qijia Jiang · Yujia Jin · Yin Tat Lee · Aaron Sidford · Kevin Tian -
2020 Poster: Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing »
Arun Jambulapati · Jerry Li · Kevin Tian -
2020 Spotlight: Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing »
Arun Jambulapati · Jerry Li · Kevin Tian -
2019 Poster: Variance Reduction for Matrix Games »
Yair Carmon · Yujia Jin · Aaron Sidford · Kevin Tian -
2019 Oral: Variance Reduction for Matrix Games »
Yair Carmon · Yujia Jin · Aaron Sidford · Kevin Tian -
2019 Poster: A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport »
Arun Jambulapati · Aaron Sidford · Kevin Tian -
2017 Poster: Learning Overcomplete HMMs »
Vatsal Sharan · Sham Kakade · Percy Liang · Gregory Valiant -
2016 Poster: Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction »
Jacob Steinhardt · Gregory Valiant · Moses Charikar -
2015 : When Your Big Data Seems Too Small »
Gregory Valiant -
2015 Poster: Testing Closeness With Unequal Sized Samples »
Bhaswar Bhattacharya · Gregory Valiant