Timezone: »
Spotlight
One sketch for all: Theory and Application of Conditional Random Sampling
Ping Li · Kenneth W Church · Trevor Hastie
Conditional Random Sampling (CRS) was originally proposed for efficiently computing pairwise ($l_2$, $l_1$) distances, in static, large-scale, and sparse data sets such as text and Web data. It was previously presented using a heuristic argument. This study extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data. Compared with other known sketching algorithms for dimension reductions such as stable random projections, CRS exhibits a significant advantage in that it is ``one-sketch-for-all.'' In particular, we demonstrate that CRS can be applied to efficiently compute the $l_p$ distance and the Hilbertian metrics, both are popular in machine learning. Although a fully rigorous analysis of CRS is difficult, we prove that, with a simple modification, CRS is rigorous at least for an important application of computing Hamming norms. A generic estimator and an approximate variance formula are provided and tested on various applications, for computing Hamming norms, Hamming distances, and $\chi^2$ distances.
Author Information
Ping Li (Baidu Research USA)
Kenneth W Church (Microsoft)
Trevor Hastie (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2008 Poster: One sketch for all: Theory and Application of Conditional Random Sampling »
Tue. Dec 9th through Mon the 8th Room
More from the Same Authors
-
2017 Poster: Partial Hard Thresholding: Towards A Principled Analysis of Support Recovery »
Jie Shen · Ping Li -
2017 Poster: Simple strategies for recovering inner products from coarsely quantized random projections »
Ping Li · Martin Slawski -
2016 Poster: Exact Recovery of Hard Thresholding Pursuit »
Xiaotong Yuan · Ping Li · Tong Zhang -
2016 Poster: Learning Additive Exponential Family Graphical Models via $\ell_{2,1}$-norm Regularized M-Estimation »
Xiaotong Yuan · Ping Li · Tong Zhang · Qingshan Liu · Guangcan Liu -
2016 Poster: Quantized Random Projections and Non-Linear Estimation of Cosine Similarity »
Ping Li · Michael Mitzenmacher · Martin Slawski -
2015 Poster: b-bit Marginal Regression »
Martin Slawski · Ping Li -
2015 Spotlight: b-bit Marginal Regression »
Martin Slawski · Ping Li -
2015 Poster: Regularization-Free Estimation in Trace Regression with Symmetric Positive Semidefinite Matrices »
Martin Slawski · Ping Li · Matthias Hein -
2014 Poster: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2014 Poster: Recovery of Coherent Data via Low-Rank Dictionary Pursuit »
Guangcan Liu · Ping Li -
2014 Poster: Online Optimization for Max-Norm Regularization »
Jie Shen · Huan Xu · Ping Li -
2014 Spotlight: Recovery of Coherent Data via Low-Rank Dictionary Pursuit »
Guangcan Liu · Ping Li -
2014 Oral: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2013 Poster: Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search »
Anshumali Shrivastava · Ping Li -
2013 Poster: Sign Cauchy Projections and Chi-Square Kernel »
Ping Li · Gennady Samorodnitsk · John Hopcroft -
2013 Poster: Compressive Feature Learning »
Hristo S Paskov · Robert West · John C Mitchell · Trevor Hastie -
2012 Poster: Entropy Estimations Using Correlated Symmetric Stable Random Projections »
Ping Li · Cun-Hui Zhang -
2012 Poster: One Permutation Hashing »
Ping Li · Art B Owen · Cun-Hui Zhang -
2011 Poster: Hashing Algorithms for Large-Scale Learning »
Ping Li · Anshumali Shrivastava · Joshua L Moore · Arnd C König -
2010 Spotlight: b-Bit Minwise Hashing for Estimating Three-Way Similarities »
Ping Li · Arnd C König · Wenhao Gui -
2010 Poster: b-Bit Minwise Hashing for Estimating Three-Way Similarities »
Ping Li · Arnd C König · Wenhao Gui -
2007 Spotlight: McRank: Learning to Rank Using Multiple Classification and Gradient Boosting »
Ping Li · Chris J Burges · Qiang Wu -
2007 Poster: McRank: Learning to Rank Using Multiple Classification and Gradient Boosting »
Ping Li · Chris J Burges · Qiang Wu -
2007 Poster: A Unified Near-Optimal Estimator For Dimension Reduction in $l_\alpha$ ($0<\alpha\leq 2$) Using Sta »
Ping Li · Trevor Hastie -
2006 Poster: Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data »
Ping Li · Kenneth W Church · Trevor Hastie