Timezone: »
Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with n\sqrt{\log n} samples. This provides off-the-shelf, distribution-independent, ``amplification'' of the amount of data available relative to common-practice estimators.
We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with n samples is even as good as that of the empirical estimator with n\log n samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.
Author Information
Yi Hao (University of California, San Diego)
Fifth-year Ph.D. student supervised by Prof. Alon Orlitsky at UC San Diego. Broadly interested in Machine Learning, Learning Theory, Algorithm Design, Symbolic and Numerical Optimization. Seeking a summer 2020 internship in Data Science and Machine Learning.
Alon Orlitsky (University of California, San Diego)
Ananda Theertha Suresh (Google)
Yihong Wu (Yale University)
More from the Same Authors
-
2021 : FedJAX: Federated learning simulation with JAX »
Jae Hun Ro · Ananda Theertha Suresh · Ke Wu -
2022 Spotlight: Lightning Talks 6A-2 »
Yichuan Mo · Botao Yu · Gang Li · Zezhong Xu · Haoran Wei · Arsene Fansi Tchango · Raef Bassily · Haoyu Lu · Qi Zhang · Songming Liu · Mingyu Ding · Peiling Lu · Yifei Wang · Xiang Li · Dongxian Wu · Ping Guo · Wen Zhang · Hao Zhongkai · Mehryar Mohri · Rishab Goel · Yisen Wang · Yifei Wang · Yangguang Zhu · Zhi Wen · Ananda Theertha Suresh · Chengyang Ying · Yujie Wang · Peng Ye · Rui Wang · Nanyi Fei · Hui Chen · Yiwen Guo · Wei Hu · Chenglong Liu · Julien Martel · Yuqi Huo · Wu Yichao · Hang Su · Yisen Wang · Peng Wang · Huajun Chen · Xu Tan · Jun Zhu · Ding Liang · Zhiwu Lu · Joumana Ghosn · Shanshan Zhang · Wei Ye · Ze Cheng · Shikun Zhang · Tao Qin · Tie-Yan Liu -
2022 Spotlight: Differentially Private Learning with Margin Guarantees »
Raef Bassily · Mehryar Mohri · Ananda Theertha Suresh -
2022 Poster: Differentially Private Learning with Margin Guarantees »
Raef Bassily · Mehryar Mohri · Ananda Theertha Suresh -
2021 Poster: Learning with User-Level Privacy »
Daniel Levy · Ziteng Sun · Kareem Amin · Satyen Kale · Alex Kulesza · Mehryar Mohri · Ananda Theertha Suresh -
2021 Poster: Boosting with Multiple Sources »
Corinna Cortes · Mehryar Mohri · Dmitry Storcheus · Ananda Theertha Suresh -
2021 Poster: Breaking the centralized barrier for cross-device federated learning »
Sai Praneeth Karimireddy · Martin Jaggi · Satyen Kale · Mehryar Mohri · Sashank Reddi · Sebastian Stich · Ananda Theertha Suresh -
2021 Poster: Optimal prediction of Markov chains with and without spectral gap »
Yanjun Han · Soham Jana · Yihong Wu -
2021 Poster: Remember What You Want to Forget: Algorithms for Machine Unlearning »
Ayush Sekhari · Jayadev Acharya · Gautam Kamath · Ananda Theertha Suresh -
2020 Poster: Linear-Sample Learning of Low-Rank Distributions »
Ayush Jain · Alon Orlitsky -
2020 Poster: A General Method for Robust Learning from Batches »
Ayush Jain · Alon Orlitsky -
2020 Poster: Optimal Prediction of the Number of Unseen Species with Multiplicity »
Yi Hao · Ping Li -
2020 Poster: SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm »
Yi Hao · Ayush Jain · Alon Orlitsky · Vaishakh Ravindrakumar -
2020 Spotlight: Optimal Prediction of the Number of Unseen Species with Multiplicity »
Yi Hao · Ping Li -
2020 Poster: Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions »
Yi Hao · Alon Orlitsky -
2020 Poster: Learning discrete distributions: user vs item-level privacy »
Yuhan Liu · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Michael D Riley -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 Poster: Unified Sample-Optimal Property Estimation in Near-Linear Time »
Yi Hao · Alon Orlitsky -
2019 Poster: The Broad Optimality of Profile Maximum Likelihood »
Yi Hao · Alon Orlitsky -
2019 Spotlight: The Broad Optimality of Profile Maximum Likelihood »
Yi Hao · Alon Orlitsky -
2019 Poster: Sampled Softmax with Random Fourier Features »
Ankit Singh Rawat · Jiecao Chen · Felix Xinnan Yu · Ananda Theertha Suresh · Sanjiv Kumar -
2019 Poster: Differentially Private Anonymized Histograms »
Ananda Theertha Suresh -
2018 Poster: On Learning Markov Chains »
Yi Hao · Alon Orlitsky · Venkatadheeraj Pichapati -
2018 Poster: Entropy Rate Estimation for Markov Chains with Large State Space »
Yanjun Han · Jiantao Jiao · Chuan-Zheng Lee · Tsachy Weissman · Yihong Wu · Tiancheng Yu -
2018 Spotlight: Entropy Rate Estimation for Markov Chains with Large State Space »
Yanjun Han · Jiantao Jiao · Chuan-Zheng Lee · Tsachy Weissman · Yihong Wu · Tiancheng Yu -
2018 Poster: cpSGD: Communication-efficient and differentially-private distributed SGD »
Naman Agarwal · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2018 Spotlight: cpSGD: Communication-efficient and differentially-private distributed SGD »
Naman Agarwal · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2017 Poster: Multiscale Quantization for Fast Similarity Search »
Xiang Wu · Ruiqi Guo · Ananda Theertha Suresh · Sanjiv Kumar · Daniel Holtmann-Rice · David Simcha · Felix Yu -
2017 Poster: The power of absolute discounting: all-dimensional distribution estimation »
Moein Falahatgar · Mesrob Ohannessian · Alon Orlitsky · Venkatadheeraj Pichapati -
2017 Poster: Model-Powered Conditional Independence Test »
Rajat Sen · Ananda Theertha Suresh · Karthikeyan Shanmugam · Alex Dimakis · Sanjay Shakkottai -
2017 Poster: Maxing and Ranking with Few Assumptions »
Moein Falahatgar · Yi Hao · Alon Orlitsky · Venkatadheeraj Pichapati · Vaishakh Ravindrakumar -
2016 Poster: Near-Optimal Smoothing of Structured Conditional Probability Matrices »
Moein Falahatgar · Mesrob Ohannessian · Alon Orlitsky -
2015 Poster: Competitive Distribution Estimation: Why is Good-Turing Good »
Alon Orlitsky · Ananda Theertha Suresh -
2015 Oral: Competitive Distribution Estimation: Why is Good-Turing Good »
Alon Orlitsky · Ananda Theertha Suresh -
2014 Poster: Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures »
Ananda Theertha Suresh · Alon Orlitsky · Jayadev Acharya · Ashkan Jafarpour -
2012 Poster: Tight Bounds on Redundancy and Distinguishability of Label-Invariant Distributions »
Jayadev Acharya · Hirakendu Das · Alon Orlitsky