NeurIPS 2021 List-Decodable Mean Estimation in Nearly-PCA Time Spotlight

Spotlight

List-Decodable Mean Estimation in Nearly-PCA Time

Ilias Diakonikolas · Daniel Kane · Daniel Kongsgaard · Jerry Li · Kevin Tian

[ Abstract ]

Abstract: Robust statistics has traditionally focused on designing estimators tolerant to a minority of contaminated data. {\em List-decodable learning}~\cite{CharikarSV17} studies the more challenging regime where only a minority

\frac{1}{k}

$\tfrac 1 k$ fraction of the dataset,

k \geq 2

$k \geq 2$ , is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new algorithm for bounded covariance distributions with optimal sample complexity and near-optimal error guarantee, running in {\em nearly-PCA time}. Assuming the ground truth distribution on

R^{d}

$\mathbb{R}^d$ has identity-bounded covariance, our algorithm outputs

O (k)

$O(k)$ candidate means, one of which is within distance

O (\sqrt{k log k})

$O(\sqrt{k\log k})$ from the truth. Our algorithm runs in time

˜ O (n d k)

$\widetilde{O}(ndk)$ , where

n

$n$ is the dataset size. This runtime nearly matches the cost of performing

k

$k$ -PCA on the data, a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest runtimes were

˜ O (n^{2} d k^{2})

$\widetilde{O}(n^2 d k^2)$ ~\cite{DiakonikolasKK20}, and

˜ O (n d k^{C})

$\widetilde{O}(nd k^C)$ \cite{CherapanamjeriMY20} for an unspecified constant

C \geq 6

$C \geq 6$ . Our approach builds on a novel soft downweighting method we term SIFT, arguably the simplest known polynomial-time mean estimator in the list-decodable setting. To develop our fast algorithms, we boost the computational cost of SIFT via a careful

win-win-win'' analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which may be of independent interest.

Chat is not available.