Timezone: »
We present a novel approach to estimating discrete distributions with (potentially) infinite support in the total variation metric. In a departure from the established paradigm, we make no structural assumptions whatsoever on the sampling distribution. In such a setting, distribution-free risk bounds are impossible, and the best one could hope for is a fully empirical data-dependent bound. We derive precisely such bounds, and demonstrate that these are, in a well-defined sense, the best possible. Our main discovery is that the half-norm of the empirical distribution provides tight upper and lower estimates on the empirical risk. Furthermore, this quantity decays at a nearly optimal rate as a function of the true distribution. The optimality follows from a minimax result, of possible independent interest. Additional structural results are provided, including an exact Rademacher complexity calculation and apparently a first connection between the total variation risk and the missing mass.
Author Information
Doron Cohen (Ben-Gurion University of the Negev)
Aryeh Kontorovich (Ben Gurion University)
Geoffrey Wolfer (Ben-Gurion University of the Negev)
More from the Same Authors
-
2023 Poster: Near-optimal learning with average Hölder smoothness »
Guy Kornowski · Aryeh Kontorovich · Steve Hanneke -
2021 Poster: Dimension-free empirical entropy estimation »
Doron Cohen · Aryeh Kontorovich · Aaron Koolyk · Geoffrey Wolfer -
2018 Poster: Learning convex polytopes with margin »
Lee-Ad Gottlieb · Eran Kaufman · Aryeh Kontorovich · Gabriel Nivasch -
2017 Poster: Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions »
Aryeh Kontorovich · Sivan Sabato · Roi Weiss -
2016 Poster: Active Nearest-Neighbor Learning in Metric Spaces »
Aryeh Kontorovich · Sivan Sabato · Ruth Urner -
2015 Poster: Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path »
Daniel Hsu · Aryeh Kontorovich · Csaba Szepesvari -
2014 Poster: Near-optimal sample compression for nearest neighbors »
Lee-Ad Gottlieb · Aryeh Kontorovich · Pinhas Nisnevitch -
2014 Poster: Consistency of weighted majority votes »
Daniel Berend · Aryeh Kontorovich -
2013 Poster: Predictive PAC Learning and Process Decompositions »
Cosma Shalizi · Aryeh Kontorovich