Timezone: »
Poster
Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms
Siu On Chan · Ilias Diakonikolas · Rocco A Servedio · Xiaorui Sun
Let $p$ be an unknown and arbitrary probability distribution over $[0 ,1)$. We consider the problem of \emph{density estimation}, in which a learning algorithm is given i.i.d. draws from $p$ and must (with high probability) output a hypothesis distribution that is close to $p$. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any $k$ and $\eps$, we give an algorithm that makes $\tilde{O}(k/\eps^2)$ draws from $p$, runs in $\tilde{O}(k/\eps^2)$ time, and outputs a hypothesis distribution $h$ that is piecewise constant with $O(k \log^2(1/\eps))$ pieces. With high probability the hypothesis $h$ satisfies $\dtv(p,h) \leq C \cdot \opt_k(p) + \eps$, where $\dtv$ denotes the total variation distance (statistical distance), $C$ is a universal constant, and $\opt_k(p)$ is the smallest total variation distance between $p$ and any $k$-piecewise constant distribution. The sample size and running time of our algorithm are both optimal up to logarithmic factors. The ``approximation factor'' $C$ that is present in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of $k$ and $\eps$ can achieve $C < 2$ regardless of what kind of hypothesis distribution it uses.
Author Information
Siu On Chan (Microsoft Research)
Ilias Diakonikolas (University of Wisconsin-Madison)
Rocco A Servedio (Columbia University)
Xiaorui Sun (Columbia University)
More from the Same Authors
-
2023 Poster: SQ Lower Bounds for Learning Mixtures of Linear Classifiers »
Ilias Diakonikolas · Daniel Kane · Yuxin Sun -
2023 Poster: Efficient Testable Learning of Halfspaces with Adversarial Label Noise »
Ilias Diakonikolas · Daniel Kane · Vasilis Kontonis · Sihan Liu · Nikos Zarifis -
2023 Poster: Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression »
Ilias Diakonikolas · Daniel Kane · Ankit Pensia · Thanasis Pittas -
2023 Poster: Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise »
Ilias Diakonikolas · Jelena Diakonikolas · Daniel Kane · Puqian Wang · Nikos Zarifis -
2023 Poster: SQ Lower Bounds for Non-Gaussian Component Analysis with Weaker Assumptions »
Ilias Diakonikolas · Daniel Kane · Lisheng Ren · Yuxin Sun -
2023 Poster: A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm »
Ilias Diakonikolas · Daniel Kane · Jasper Lee · Ankit Pensia · Thanasis Pittas -
2023 Poster: First Order Stochastic Optimization with Oblivious Noise »
Ilias Diakonikolas · Sushrut Karmalkar · Jong Ho Park · Christos Tzamos -
2023 Poster: Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing »
Shuyao Li · Yu Cheng · Ilias Diakonikolas · Jelena Diakonikolas · Rong Ge · Stephen Wright -
2022 Poster: SQ Lower Bounds for Learning Single Neurons with Massart Noise »
Ilias Diakonikolas · Daniel Kane · Lisheng Ren · Yuxin Sun -
2022 Poster: Nearly-Tight Bounds for Testing Histogram Distributions »
Clément L Canonne · Ilias Diakonikolas · Daniel Kane · Sihan Liu -
2022 Poster: List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering »
Ilias Diakonikolas · Daniel Kane · Sushrut Karmalkar · Ankit Pensia · Thanasis Pittas -
2022 Poster: Cryptographic Hardness of Learning Halfspaces with Massart Noise »
Ilias Diakonikolas · Daniel Kane · Pasin Manurangsi · Lisheng Ren -
2022 Poster: Outlier-Robust Sparse Estimation via Non-Convex Optimization »
Yu Cheng · Ilias Diakonikolas · Rong Ge · Shivam Gupta · Daniel Kane · Mahdi Soltanolkotabi -
2022 Poster: Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions »
Ilias Diakonikolas · Daniel Kane · Jasper Lee · Ankit Pensia -
2018 Poster: Robust Learning of Fixed-Structure Bayesian Networks »
Yu Cheng · Ilias Diakonikolas · Daniel Kane · Alistair Stewart -
2018 Poster: Sharp Bounds for Generalized Uniformity Testing »
Ilias Diakonikolas · Daniel M. Kane · Alistair Stewart -
2018 Poster: Testing for Families of Distributions via the Fourier Transform »
Alistair Stewart · Ilias Diakonikolas · Clément L Canonne -
2018 Spotlight: Sharp Bounds for Generalized Uniformity Testing »
Ilias Diakonikolas · Daniel M. Kane · Alistair Stewart -
2017 Poster: Linear regression without correspondence »
Daniel Hsu · Kevin Shi · Xiaorui Sun -
2011 Poster: Learning large-margin halfspaces with more malicious noise »
Phil Long · Rocco A Servedio -
2011 Poster: Algorithms and hardness results for parallel large margin learning »
Rocco A Servedio · Phil Long -
2011 Spotlight: Algorithms and hardness results for parallel large margin learning »
Rocco A Servedio · Phil Long -
2008 Poster: Adaptive Martingale Boosting »
Phil Long · Rocco A Servedio -
2008 Spotlight: Adaptive Martingale Boosting »
Phil Long · Rocco A Servedio -
2007 Spotlight: One-Pass Boosting »
Zafer Barutcuoglu · Phil Long · Rocco A Servedio -
2007 Poster: One-Pass Boosting »
Zafer Barutcuoglu · Phil Long · Rocco A Servedio -
2007 Spotlight: Boosting the Area under the ROC Curve »
Phil Long · Rocco A Servedio -
2007 Poster: Boosting the Area under the ROC Curve »
Phil Long · Rocco A Servedio -
2006 Poster: Attribute-efficient learning of linear threshold functions under unconcentrated distributions »
Phil Long · Rocco A Servedio