Skip to yearly menu bar Skip to main content


Poster

Oja's Algorithm for Streaming Sparse PCA

Syamantak Kumar · Purnamrita Sarkar

West Ballroom A-D #6904
[ ]
Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Oja's algorithm for Streaming Principal Component Analysis (PCA) for n data-points in a d dimensional space achieves the same sin-squared error O(reff/n) as the offline algorithm in O(d) space and O(nd) time and a single pass through the datapoints. Here reff is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix Σ). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of Σ is s-sparse, and reff can be large. In this setting, to our knowledge, *there are no known single-pass algorithms* that achieve the minimax error bound in O(d) space and O(nd) time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix.We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in O(d) space and O(nd) time. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the reff is bounded.

Chat is not available.