Timezone: »

New Subsampling Algorithms for Fast Least Squares Regression
Paramveer Dhillon · Yichao Lu · Dean P Foster · Lyle Ungar

Sat Dec 07 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data ($n \gg p$). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O($np$) and our best method, {\it Uluru}, gives an error bound of $O(\sqrt{p/n})$ which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy.

Author Information

Paramveer Dhillon (University of Pennsylvania)
Yichao Lu (University of Pennsylvania)
Dean P Foster (University of Pennsylvania)
Lyle Ungar (University of Pennsylvania)

More from the Same Authors