Timezone: »

Estimating Entropy of Distributions in Constant Space
Jayadev Acharya · Sourbh Bhadane · Piotr Indyk · Ziteng Sun

Thu Dec 12 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #237
We consider the task of estimating the entropy of $k$-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires $O\left(\frac{k \log (1/\varepsilon)^2}{\varepsilon^3}\right)$ samples and a constant $O(1)$ memory words of space and outputs a $\pm\varepsilon$ estimate of $H(p)$. Without space limitations, the sample complexity has been established as $S(k,\varepsilon)=\Theta\left(\frac k{\varepsilon\log k}+\frac{\log^2 k}{\varepsilon^2}\right)$, which is sub-linear in the domain size $k$, and the current algorithms that achieve optimal sample complexity also require nearly-linear space in $k$. Our algorithm partitions $[0,1]$ into intervals and estimates the entropy contribution of probability values in each interval. The intervals are designed to trade bias and variance. Distribution property estimation and testing with limited memory is a largely unexplored research area. We hope our work will motivate research in this field.

Author Information

Jayadev Acharya (Cornell University)
Sourbh Bhadane (Cornell University)
Piotr Indyk (MIT)
Ziteng Sun (Cornell University)

More from the Same Authors