Timezone: »
Poster
Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers
Tommaso d'Orsi · ChihHung Liu · Rajai Nasser · Gleb Novikov · David Steurer · Stefan Tiegel
We develop machinery to design efficiently computable and \emph{consistent} estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an $\alpha$ fraction of the samples.As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA).For sparse regression, we achieve consistency for optimal sample size $n\gtrsim (k\log d)/\alpha^2$ and optimal error rate $O(\sqrt{(k\log d)/(n\cdot \alpha^2)})$where $n$ is the number of observations, $d$ is the number of dimensions and $k$ is the sparsity of the parameter vector, allowing the fraction of inliers to be inversepolynomial in the number of samples.Prior to this work, no estimator was known to be consistent when the fraction of inliers $\alpha$ is $o(1/\log \log n)$, even for (nonspherical) Gaussian design matrices.Results holding under weak design assumptions and in the presence of such general noise have only been shown in dense setting (i.e., general linear regression) very recently by d'Orsi et al.~\cite{ICMLlinearregression}.In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix (usually used in matrix completion). Previous works could obtain nontrivial guarantees only under the assumptions that the measurement noise corresponding to the inliers is polynomially small in $n$ (e.g., Gaussian with variance $1/n^2$).To devise our estimators, we equip the Huber loss with nonsmooth regularizers such as the $\ell_1$ norm or the nuclear norm, and extend d'Orsi et al.'s approach~\cite{ICMLlinearregression} in a novel way to analyze the loss function.Our machinery appears to be easily applicable to a wide range of estimation problems.We complement these algorithmic results with statistical lower bounds showing that the fraction of inliers that our PCA estimator can deal with is optimal up to a constant factor.
Author Information
Tommaso d'Orsi (Swiss Federal Institute of Technology)
ChihHung Liu (Swiss Federal Institute of Technology)
Rajai Nasser (Swiss Federal Institute of Technology)
Gleb Novikov (ETH Zürich)
David Steurer (ETH Zurich)
Stefan Tiegel (ETH Zürich)
More from the Same Authors

2021 Poster: The Complexity of Sparse Tensor PCA »
Davin Choo · Tommaso d'Orsi 
2020 Poster: Estimating RankOne Spikes from HeavyTailed Noise via SelfAvoiding Walks »
Jingqiu Ding · Sam Hopkins · David Steurer 
2020 Spotlight: Estimating RankOne Spikes from HeavyTailed Noise via SelfAvoiding Walks »
Jingqiu Ding · Sam Hopkins · David Steurer