Spotlight
List-Decodable Mean Estimation in Nearly-PCA Time
Ilias Diakonikolas · Daniel Kane · Daniel Kongsgaard · Jerry Li · Kevin Tian
Abstract:
Robust statistics has traditionally focused on designing estimators tolerant to a minority of contaminated data. {\em List-decodable learning}~\cite{CharikarSV17} studies the more challenging regime where only a minority 1k1k fraction of the dataset, k≥2k≥2, is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new algorithm for bounded covariance distributions with optimal sample complexity and near-optimal error guarantee, running in {\em nearly-PCA time}. Assuming the ground truth distribution on RdRd has identity-bounded covariance, our algorithm outputs O(k)O(k) candidate means, one of which is within distance O(√klogk)O(√klogk) from the truth. Our algorithm runs in time ˜O(ndk)˜O(ndk), where nn is the dataset size. This runtime nearly matches the cost of performing kk-PCA on the data, a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest runtimes were ˜O(n2dk2)˜O(n2dk2)~\cite{DiakonikolasKK20}, and ˜O(ndkC)˜O(ndkC) \cite{CherapanamjeriMY20} for an unspecified constant C≥6C≥6. Our approach builds on a novel soft downweighting method we term SIFT, arguably the simplest known polynomial-time mean estimator in the list-decodable setting. To develop our fast algorithms, we boost the computational cost of SIFT via a careful win-win-win'' analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which may be of independent interest.
Chat is not available.