Timezone: »
Jaccard similarity is widely used as a distance measure in many machine learning and search applications. Typically, hashing methods are essential for the use of Jaccard similarity to be practical in large-scale settings. For hashing binary (0/1) data, the idea of one permutation hashing (OPH) with densification significantly accelerates traditional minwise hashing algorithms while providing unbiased and accurate estimates. In this paper, we propose a strategy named “re-randomization” in the process of densification that could achieve the smallest variance among all densification schemes. The success of this idea naturally inspires us to generalize one permutation hashing to weighted (non-binary) data, which results in the socalled “bin-wise consistent weighted sampling (BCWS)” algorithm. We analyze the behavior of BCWS and compare it with a recent alternative. Extensive experiments on various datasets illustrates the effectiveness of our proposed methods.
Author Information
Ping Li (Baidu Research USA)
Xiaoyun Li (Rutgers University)
Cun-Hui Zhang (Rutgers)
More from the Same Authors
-
2022 Poster: On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond »
Xiaotong Yuan · Ping Li -
2022 Poster: Private Graph All-Pairwise-Shortest-Path Distance Release with Improved Error Rate »
Chenglin Fan · Ping Li · Xiaoyun Li -
2022 Poster: SignRFF: Sign Random Fourier Features »
Xiaoyun Li · Ping Li -
2022 Poster: Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class »
Khoa D Doan · Yingjie Lao · Ping Li -
2021 Poster: A Comprehensively Tight Analysis of Gradient Descent for PCA »
Zhiqiang Xu · Ping Li -
2021 Poster: Backdoor Attack with Imperceptible Input and Latent Modification »
Khoa Doan · Yingjie Lao · Ping Li -
2021 Poster: A Note on Sparse Generalized Eigenvalue Problem »
Yunfeng Cai · Guanhua Fang · Ping Li -
2021 Poster: Mitigating Forgetting in Online Continual Learning with Neuron Calibration »
Haiyan Yin · peng yang · Ping Li -
2021 Poster: Rate-Optimal Subspace Estimation on Random Graphs »
Zhixin Zhou · Fan Zhou · Ping Li · Cun-Hui Zhang -
2021 Poster: Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction »
Jing Zhang · Jianwen Xie · Nick Barnes · Ping Li -
2020 Poster: Thunder: a Fast Coordinate Selection Solver for Sparse Learning »
Shaogang Ren · Weijie Zhao · Ping Li -
2020 Poster: Optimal Prediction of the Number of Unseen Species with Multiplicity »
Yi Hao · Ping Li -
2020 Spotlight: Optimal Prediction of the Number of Unseen Species with Multiplicity »
Yi Hao · Ping Li -
2020 Poster: Towards Better Generalization of Adaptive Gradient Methods »
Yingxue Zhou · Belhal Karimi · Jinxing Yu · Zhiqiang Xu · Ping Li -
2020 Poster: Ratio Trace Formulation of Wasserstein Discriminant Analysis »
Hexuan Liu · Yunfeng Cai · You-Lin Chen · Ping Li -
2019 Poster: Outlier Detection and Robust PCA Using a Convex Measure of Innovation »
Mostafa Rahmani · Ping Li -
2019 Poster: Towards Practical Alternating Least-Squares for CCA »
Zhiqiang Xu · Ping Li -
2019 Poster: Generalization Error Analysis of Quantized Compressive Learning »
Xiaoyun Li · Ping Li -
2019 Spotlight: Generalization Error Analysis of Quantized Compressive Learning »
Xiaoyun Li · Ping Li -
2019 Poster: Möbius Transformation for Fast Inner Product Search on Graph »
Zhixin Zhou · Shulong Tan · Zhaozhuo Xu · Ping Li -
2019 Poster: Random Projections with Asymmetric Quantization »
Xiaoyun Li · Ping Li -
2017 Poster: Partial Hard Thresholding: Towards A Principled Analysis of Support Recovery »
Jie Shen · Ping Li -
2017 Poster: Simple strategies for recovering inner products from coarsely quantized random projections »
Ping Li · Martin Slawski -
2016 Poster: Exact Recovery of Hard Thresholding Pursuit »
Xiaotong Yuan · Ping Li · Tong Zhang -
2016 Poster: Learning Additive Exponential Family Graphical Models via $\ell_{2,1}$-norm Regularized M-Estimation »
Xiaotong Yuan · Ping Li · Tong Zhang · Qingshan Liu · Guangcan Liu -
2016 Poster: Quantized Random Projections and Non-Linear Estimation of Cosine Similarity »
Ping Li · Michael Mitzenmacher · Martin Slawski -
2015 Poster: b-bit Marginal Regression »
Martin Slawski · Ping Li -
2015 Spotlight: b-bit Marginal Regression »
Martin Slawski · Ping Li -
2015 Poster: Regularization-Free Estimation in Trace Regression with Symmetric Positive Semidefinite Matrices »
Martin Slawski · Ping Li · Matthias Hein -
2014 Poster: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2014 Poster: Recovery of Coherent Data via Low-Rank Dictionary Pursuit »
Guangcan Liu · Ping Li -
2014 Poster: Online Optimization for Max-Norm Regularization »
Jie Shen · Huan Xu · Ping Li -
2014 Spotlight: Recovery of Coherent Data via Low-Rank Dictionary Pursuit »
Guangcan Liu · Ping Li -
2014 Oral: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2013 Poster: Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search »
Anshumali Shrivastava · Ping Li -
2013 Poster: Sign Cauchy Projections and Chi-Square Kernel »
Ping Li · Gennady Samorodnitsk · John Hopcroft -
2012 Poster: Entropy Estimations Using Correlated Symmetric Stable Random Projections »
Ping Li · Cun-Hui Zhang -
2012 Poster: One Permutation Hashing »
Ping Li · Art B Owen · Cun-Hui Zhang -
2011 Poster: Hashing Algorithms for Large-Scale Learning »
Ping Li · Anshumali Shrivastava · Joshua L Moore · Arnd C König -
2010 Spotlight: b-Bit Minwise Hashing for Estimating Three-Way Similarities »
Ping Li · Arnd C König · Wenhao Gui -
2010 Poster: b-Bit Minwise Hashing for Estimating Three-Way Similarities »
Ping Li · Arnd C König · Wenhao Gui -
2008 Poster: One sketch for all: Theory and Application of Conditional Random Sampling »
Ping Li · Kenneth W Church · Trevor Hastie -
2008 Spotlight: One sketch for all: Theory and Application of Conditional Random Sampling »
Ping Li · Kenneth W Church · Trevor Hastie -
2007 Spotlight: McRank: Learning to Rank Using Multiple Classification and Gradient Boosting »
Ping Li · Chris J Burges · Qiang Wu -
2007 Poster: McRank: Learning to Rank Using Multiple Classification and Gradient Boosting »
Ping Li · Chris J Burges · Qiang Wu -
2007 Poster: A Unified Near-Optimal Estimator For Dimension Reduction in $l_\alpha$ ($0<\alpha\leq 2$) Using Sta »
Ping Li · Trevor Hastie -
2006 Poster: Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data »
Ping Li · Kenneth W Church · Trevor Hastie