NIPS Poster Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Poster

Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Siu On Chan · Ilias Diakonikolas · Rocco A Servedio · Xiaorui Sun

[ Abstract ]

[ PDF]

2014 Poster

Abstract: Let

p

be an unknown and arbitrary probability distribution over

[0, 1)

. We consider the problem of \emph{density estimation}, in which a learning algorithm is given i.i.d. draws from

p

and must (with high probability) output a hypothesis distribution that is close to

p

. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any

k

and

\eps

, we give an algorithm that makes

\tilde{O} (k / \eps^{2})

draws from

p

, runs in

\tilde{O} (k / \eps^{2})

time, and outputs a hypothesis distribution

h

that is piecewise constant with

O (k \log^{2} (1 / \eps))

pieces. With high probability the hypothesis

h

satisfies

\dtv (p, h) \leq C \cdot \opt_{k} (p) + \eps

, where

\dtv

denotes the total variation distance (statistical distance),

C

is a universal constant, and

\opt_{k} (p)

is the smallest total variation distance between

p

and any

k

-piecewise constant distribution. The sample size and running time of our algorithm are both optimal up to logarithmic factors. The ``approximation factor''

C

that is present in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of

k

and

\eps

can achieve

C < 2

regardless of what kind of hypothesis distribution it uses.

Chat is not available.