NIPS Poster Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

Poster

Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

Cameron Musco · David Woodruff

Pacific Ballroom #208

Keywords: [ Kernel Methods ] [ Matrix and Tensor Factorization ] [ Hardness of Learning and Approximations ] [ Computational Complexity ]

[ Abstract ]

Abstract: Low-rank approximation is a common tool used to accelerate kernel methods: the

n \times n

$n \times n$ kernel matrix

K

$K$ is approximated via a rank-

k

$k$ matrix

~ K

$\tilde K$ which can be stored in much less space and processed more quickly. In this work we study the limits of computationally efficient low-rank kernel approximation. We show that for a broad class of kernels, including the popular Gaussian and polynomial kernels, computing a relative error

k

$k$ -rank approximation to

K

$K$ is at least as difficult as multiplying the input data matrix

A \in R^{n \times d}

$A \in R^{n \times d}$ by an arbitrary matrix

C \in R^{d \times k}

$C \in R^{d \times k}$ . Barring a breakthrough in fast matrix multiplication, when

k

$k$ is not too large, this requires

Ω (n n z (A) k)

$\Omega(nnz(A)k)$ time where

n n z (A)

$nnz(A)$ is the number of non-zeros in

A

$A$ . This lower bound matches, in many parameter regimes, recent work on subquadratic time algorithms for low-rank approximation of general kernels [MM16,MW17], demonstrating that these algorithms are unlikely to be significantly improved, in particular to

O (n n z (A))

$O(nnz(A))$ input sparsity runtimes. At the same time there is hope: we show for the first time that

O (n n z (A))

$O(nnz(A))$ time approximation is possible for general radial basis function kernels (e.g., the Gaussian kernel) for the closely related problem of low-rank approximation of the kernelized dataset.

Live content is unavailable. Log in and register to view live content