Poster
Near-Optimal Entrywise Sampling for Data Matrices
Dimitris Achlioptas · Zohar Karnin · Edo Liberty
[
Abstract
]
2013 Poster
Abstract:
We consider the problem of independently sampling non-zero entries of a matrix in order to produce a sparse sketch of it, , that minimizes . For large matrices, such that (for example, representing observations over attributes) we give distributions exhibiting four important properties. First, they have closed forms for the probability of sampling each item which are computable from minimal information regarding . Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model.
Live content is unavailable. Log in and register to view live content