Timezone: »
Poster
SignRFF: Sign Random Fourier Features
Xiaoyun Li · Ping Li
The industry practice has been moving to embedding based retrieval (EBR). For example, in many applications, the embedding vectors are trained by some form of two-tower models. During serving phase, candidates (embedding vectors) are retrieved according to the rankings of cosine similarities either exhaustively or by approximate near neighbor (ANN) search algorithms. For those applications, it is natural to apply ``sign random projections'' (SignRP) or variants, on the trained embedding vectors to facilitate efficient data storage and cosine distance computations. SignRP is also one of the standard indexing schemes for conducting approximate near neighbor search. In the literature, SignRP has been popular and, to an extent, becomes the default method for ``locality sensitive hashing'' (LSH). In this paper, we propose ``sign random Fourier features'' (SignRFF) as an alternative to SignRP. The original method of random Fourier features (RFF) is a standard technique for approximating the Gaussian kernel (as opposed to the linear cosine kernel), in the literature of large-scale machine learning. Basically, RFF applies a simple nonlinear transformation on the samples generated by random projections (RP). Thus, in the pipeline of EBR, it is straightforward to replace SignRP by SignRFF. This paper explains, in a principled manner, why it makes sense to do so. In this paper, a new analytical measure called \textbf{Ranking Efficiency (RE)} is developed, which in retrospect is closely related to the ``two-sample mean'' $t$-test statistic for binomial variables. RE provides a systematic and unified framework for comparing different LSH methods. We compare our proposed SignRP with SignRP, KLSH (kernel LSH), as well SQ-RFF (which is another 1-bit coding scheme for RFF). According to the RE expression, SignRFF consistently outperforms KLSH (for Gaussian kernel) and SQ-RFF. SignRFF also outperforms SignRP in the relatively high similarity region. The theoretical comparison results are consistent with our empirical findings. In addition, experiments are conducted to compare SignRFF with a wide range of data-dependent and deep learning based hashing methods and show the advantage of SignRFF with a sufficient number of hash bits.
Author Information
Xiaoyun Li (Rutgers University)
Xiaoyun Li is a 4th year PhD student from the Department of Statistics at Rutgers University, supervised by Professor Ping Li. His research interests include statistical learning, hashing, randomized algorithms and optimization.
Ping Li (Baidu Research USA)
More from the Same Authors
-
2022 Poster: On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond »
Xiaotong Yuan · Ping Li -
2022 Poster: Private Graph All-Pairwise-Shortest-Path Distance Release with Improved Error Rate »
Chenglin Fan · Ping Li · Xiaoyun Li -
2022 Poster: Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class »
Khoa D Doan · Yingjie Lao · Ping Li -
2021 Poster: A Comprehensively Tight Analysis of Gradient Descent for PCA »
Zhiqiang Xu · Ping Li -
2021 Poster: Backdoor Attack with Imperceptible Input and Latent Modification »
Khoa Doan · Yingjie Lao · Ping Li -
2021 Poster: A Note on Sparse Generalized Eigenvalue Problem »
Yunfeng Cai · Guanhua Fang · Ping Li -
2021 Poster: Mitigating Forgetting in Online Continual Learning with Neuron Calibration »
Haiyan Yin · peng yang · Ping Li -
2021 Poster: Rate-Optimal Subspace Estimation on Random Graphs »
Zhixin Zhou · Fan Zhou · Ping Li · Cun-Hui Zhang -
2021 Poster: Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction »
Jing Zhang · Jianwen Xie · Nick Barnes · Ping Li -
2020 Poster: Thunder: a Fast Coordinate Selection Solver for Sparse Learning »
Shaogang Ren · Weijie Zhao · Ping Li -
2020 Poster: Optimal Prediction of the Number of Unseen Species with Multiplicity »
Yi Hao · Ping Li -
2020 Spotlight: Optimal Prediction of the Number of Unseen Species with Multiplicity »
Yi Hao · Ping Li -
2020 Poster: Towards Better Generalization of Adaptive Gradient Methods »
Yingxue Zhou · Belhal Karimi · Jinxing Yu · Zhiqiang Xu · Ping Li -
2020 Poster: Ratio Trace Formulation of Wasserstein Discriminant Analysis »
Hexuan Liu · Yunfeng Cai · You-Lin Chen · Ping Li -
2019 Poster: Outlier Detection and Robust PCA Using a Convex Measure of Innovation »
Mostafa Rahmani · Ping Li -
2019 Poster: Towards Practical Alternating Least-Squares for CCA »
Zhiqiang Xu · Ping Li -
2019 Poster: Generalization Error Analysis of Quantized Compressive Learning »
Xiaoyun Li · Ping Li -
2019 Spotlight: Generalization Error Analysis of Quantized Compressive Learning »
Xiaoyun Li · Ping Li -
2019 Poster: Möbius Transformation for Fast Inner Product Search on Graph »
Zhixin Zhou · Shulong Tan · Zhaozhuo Xu · Ping Li -
2019 Poster: Random Projections with Asymmetric Quantization »
Xiaoyun Li · Ping Li -
2019 Poster: Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling »
Ping Li · Xiaoyun Li · Cun-Hui Zhang -
2017 Poster: Partial Hard Thresholding: Towards A Principled Analysis of Support Recovery »
Jie Shen · Ping Li -
2017 Poster: Simple strategies for recovering inner products from coarsely quantized random projections »
Ping Li · Martin Slawski -
2016 Poster: Exact Recovery of Hard Thresholding Pursuit »
Xiaotong Yuan · Ping Li · Tong Zhang -
2016 Poster: Learning Additive Exponential Family Graphical Models via $\ell_{2,1}$-norm Regularized M-Estimation »
Xiaotong Yuan · Ping Li · Tong Zhang · Qingshan Liu · Guangcan Liu -
2016 Poster: Quantized Random Projections and Non-Linear Estimation of Cosine Similarity »
Ping Li · Michael Mitzenmacher · Martin Slawski -
2015 Poster: b-bit Marginal Regression »
Martin Slawski · Ping Li -
2015 Spotlight: b-bit Marginal Regression »
Martin Slawski · Ping Li -
2015 Poster: Regularization-Free Estimation in Trace Regression with Symmetric Positive Semidefinite Matrices »
Martin Slawski · Ping Li · Matthias Hein -
2014 Poster: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2014 Poster: Recovery of Coherent Data via Low-Rank Dictionary Pursuit »
Guangcan Liu · Ping Li -
2014 Poster: Online Optimization for Max-Norm Regularization »
Jie Shen · Huan Xu · Ping Li -
2014 Spotlight: Recovery of Coherent Data via Low-Rank Dictionary Pursuit »
Guangcan Liu · Ping Li -
2014 Oral: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2013 Poster: Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search »
Anshumali Shrivastava · Ping Li -
2013 Poster: Sign Cauchy Projections and Chi-Square Kernel »
Ping Li · Gennady Samorodnitsk · John Hopcroft -
2012 Poster: Entropy Estimations Using Correlated Symmetric Stable Random Projections »
Ping Li · Cun-Hui Zhang -
2012 Poster: One Permutation Hashing »
Ping Li · Art B Owen · Cun-Hui Zhang -
2011 Poster: Hashing Algorithms for Large-Scale Learning »
Ping Li · Anshumali Shrivastava · Joshua L Moore · Arnd C König -
2010 Spotlight: b-Bit Minwise Hashing for Estimating Three-Way Similarities »
Ping Li · Arnd C König · Wenhao Gui -
2010 Poster: b-Bit Minwise Hashing for Estimating Three-Way Similarities »
Ping Li · Arnd C König · Wenhao Gui -
2008 Poster: One sketch for all: Theory and Application of Conditional Random Sampling »
Ping Li · Kenneth W Church · Trevor Hastie -
2008 Spotlight: One sketch for all: Theory and Application of Conditional Random Sampling »
Ping Li · Kenneth W Church · Trevor Hastie -
2007 Spotlight: McRank: Learning to Rank Using Multiple Classification and Gradient Boosting »
Ping Li · Chris J Burges · Qiang Wu -
2007 Poster: McRank: Learning to Rank Using Multiple Classification and Gradient Boosting »
Ping Li · Chris J Burges · Qiang Wu -
2007 Poster: A Unified Near-Optimal Estimator For Dimension Reduction in $l_\alpha$ ($0<\alpha\leq 2$) Using Sta »
Ping Li · Trevor Hastie -
2006 Poster: Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data »
Ping Li · Kenneth W Church · Trevor Hastie