Timezone: »
Poster
Towards a Zero-One Law for Column Subset Selection
Zhao Song · David Woodruff · Peilin Zhong
Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #47
There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-$k$ matrix $B$ minimizing the sum of absolute values of differences to a given $n$-by-$n$ matrix $A$, $\min_{\textrm{rank-}k~B}\|A-B\|_1$, or more generally finding a rank-$k$ matrix $B$ which minimizes the sum of $p$-th powers of absolute values of differences, $\min_{\textrm{rank-}k~B}\|A-B\|_p^p$. Many of these algorithms are linear time columns subset selection algorithms,
returning a subset of $\poly(k \log n)$ columns whose cost is no more than a $\poly(k)$ factor larger than the cost of the best rank-$k$ matrix.
The above error measures are special cases of the following general entrywise
low rank approximation problem: given an arbitrary function $g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}$, find a rank-$k$ matrix $B$ which minimizes $\|A-B\|_g = \sum_{i,j}g(A_{i,j}-B_{i,j})$. A natural question is which functions $g$ admit efficient approximation algorithms? Indeed, this is a central question of recent work studying generalized low rank models. In this work we give approximation algorithms for {\it every} function $g$ which is approximately monotone and satisfies an approximate triangle inequality, and we show both of these conditions are necessary. Further, our algorithm is efficient if the function $g$ admits an efficient approximate regression algorithm. Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e.g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.
Author Information
Zhao Song (University of Washington)
David Woodruff (Carnegie Mellon University)
Peilin Zhong (Columbia University)
More from the Same Authors
-
2023 Poster: Lower Bounds on Adaptive Sensing for Matrix Recovery »
Praneeth Kacham · David Woodruff -
2023 Poster: Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming »
Gregory Dexter · Petros Drineas · David Woodruff · Taisuke Yasuda -
2023 Poster: Computing Approximate $\ell_p$ Sensitivities »
Swati Padmanabhan · David Woodruff · Richard Zhang -
2023 Poster: Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products »
Tamas Sarlos · Xingyou Song · David Woodruff · Richard Zhang -
2023 Poster: On Robust Streaming for Learning with Experts: Algorithms and Lower Bounds »
David Woodruff · Fred Zhang · Samson Zhou -
2023 Poster: Near-Optimal $k$-Clustering in the Sliding Window Model »
David Woodruff · Peilin Zhong · Samson Zhou -
2022 Spotlight: Optimal Query Complexities for Dynamic Trace Estimation »
David Woodruff · Fred Zhang · Richard Zhang -
2022 Poster: Optimal Query Complexities for Dynamic Trace Estimation »
David Woodruff · Fred Zhang · Richard Zhang -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2019 Poster: Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels »
Michela Meister · Tamas Sarlos · David Woodruff -
2019 Poster: Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss »
Zhao Song · David Woodruff · Peilin Zhong -
2019 Poster: On the Convergence Rate of Training Recurrent Neural Networks »
Zeyuan Allen-Zhu · Yuanzhi Li · Zhao Song -
2019 Poster: Efficient and Thrifty Voting by Any Means Necessary »
Debmalya Mandal · Ariel Procaccia · Nisarg Shah · David Woodruff -
2019 Poster: Regularized Weighted Low Rank Approximation »
Frank Ban · David Woodruff · Richard Zhang -
2019 Oral: Efficient and Thrifty Voting by Any Means Necessary »
Debmalya Mandal · Ariel Procaccia · Nisarg Shah · David Woodruff -
2019 Poster: Rethinking Generative Mode Coverage: A Pointwise Guaranteed Approach »
Peilin Zhong · Yuchen Mo · Chang Xiao · Pengyu Chen · Changxi Zheng -
2019 Poster: Total Least Squares Regression in Input Sparsity Time »
Huaian Diao · Zhao Song · David Woodruff · Xin Yang -
2019 Poster: Efficient Symmetric Norm Regression via Linear Sketching »
Zhao Song · Ruosong Wang · Lin Yang · Hongyang Zhang · Peilin Zhong -
2019 Poster: Optimal Sketching for Kronecker Product Regression and Low Rank Approximation »
Huaian Diao · Rajesh Jayaram · Zhao Song · Wen Sun · David Woodruff -
2018 Poster: Robust Subspace Approximation in a Stream »
Roie Levin · Anish Prasad Sevekari · David Woodruff -
2018 Spotlight: Robust Subspace Approximation in a Stream »
Roie Levin · Anish Prasad Sevekari · David Woodruff -
2018 Poster: On Coresets for Logistic Regression »
Alexander Munteanu · Chris Schwiegelshohn · Christian Sohler · David Woodruff -
2018 Spotlight: On Coresets for Logistic Regression »
Alexander Munteanu · Chris Schwiegelshohn · Christian Sohler · David Woodruff -
2018 Poster: Sublinear Time Low-Rank Approximation of Distance Matrices »
Ainesh Bakshi · David Woodruff -
2018 Spotlight: Sublinear Time Low-Rank Approximation of Distance Matrices »
Ainesh Bakshi · David Woodruff -
2018 Poster: BourGAN: Generative Networks with Metric Embeddings »
Chang Xiao · Peilin Zhong · Changxi Zheng -
2018 Spotlight: BourGAN: Generative Networks with Metric Embeddings »
Chang Xiao · Peilin Zhong · Changxi Zheng