Poster
Data Amplification: A Unified and Competitive Approach to Property Estimation
Yi Hao · Alon Orlitsky · Ananda Theertha Suresh · Yihong Wu

Wed Dec 5th 10:45 AM -- 12:45 PM @ Room 210 #58

Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with n\sqrt{\log n} samples. This provides off-the-shelf, distribution-independent, amplification'' of the amount of data available relative to common-practice estimators.

We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with n samples is even as good as that of the empirical estimator with n\log n samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.

#### Author Information

##### Yi Hao (University of California, San Diego)

Fifth-year Ph.D. student supervised by Prof. Alon Orlitsky at UC San Diego. Broadly interested in Machine Learning, Learning Theory, Algorithm Design, Symbolic and Numerical Optimization. Seeking a summer 2020 internship in Data Science and Machine Learning.