Timezone: »
Stochastic Gradient Descent or SGD is the most popular optimization algorithm for large-scale problems. SGD estimates the gradient by uniform sampling with sample size one. There have been several other works that suggest faster epoch-wise convergence by using weighted non-uniform sampling for better gradient estimates. Unfortunately, the per-iteration cost of maintaining this adaptive distribution for gradient estimation is more than calculating the full gradient itself, which we call the chicken-and-the-egg loop. As a result, the false impression of faster convergence in iterations, in reality, leads to slower convergence in time. In this paper, we break this barrier by providing the first demonstration of a scheme, Locality sensitive hashing (LSH) sampled Stochastic Gradient Descent (LGD), which leads to superior gradient estimation while keeping the sampling cost per iteration similar to that of the uniform sampling. Such an algorithm is possible due to the sampling view of LSH, which came to light recently. As a consequence of superior and fast estimation, we reduce the running time of all existing gradient descent algorithms, that relies on gradient estimates including Adam, Ada-grad, etc. We demonstrate the effectiveness of our proposal with experiments on linear models as well as the non-linear BERT, which is a recent popular deep learning based language representation model.
Author Information
Beidi Chen (Rice University)
I'm a third year Ph.D. Student at Rice University and working with Dr. Anshumali Shrivastava. My research topic is hashing in large-scale learning. I work closely with Dr. Rebecca Steorts on Record Linkage. I had my undergrad in Berkeley and my Advisor was Randy Katz. My topic was data mining.
Yingchen Xu (Airbnb)
Anshumali Shrivastava (Rice University)
More from the Same Authors
-
2021 Spotlight: Practical Near Neighbor Search via Group Testing »
Joshua Engels · Benjamin Coleman · Anshumali Shrivastava -
2021 : PISTACHIO: Patch Importance Sampling To Accelerate CNNs via a Hash Index Optimizer »
Zhaozhuo Xu · Anshumali Shrivastava -
2022 : Adaptive Sparse Federated Learning in Large Output Spaces via Hashing »
Zhaozhuo Xu · Luyang Liu · Zheng Xu · Anshumali Shrivastava -
2023 Poster: DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries »
Joshua Engels · Benjamin Coleman · Vihan Lakshman · Anshumali Shrivastava -
2023 Poster: One-Pass Distribution Sketch for Measuring Data Heterogeneity in Federated Learning »
Zichang Liu · Zhaozhuo Xu · Benjamin Coleman · Anshumali Shrivastava -
2023 Poster: Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time »
Zichang Liu · Aditya Desai · Fangshuo Liao · Weitao Wang · Victor Xie · Zhaozhuo Xu · Anastasios Kyrillidis · Anshumali Shrivastava -
2022 Poster: The trade-offs of model size in large recommendation models : 100GB to 10MB Criteo-tb DLRM model »
Aditya Desai · Anshumali Shrivastava -
2022 Poster: Retaining Knowledge for Learning with Dynamic Definition »
Zichang Liu · Benjamin Coleman · Tianyi Zhang · Anshumali Shrivastava -
2022 Poster: Graph Reordering for Cache-Efficient Near Neighbor Search »
Benjamin Coleman · Santiago Segarra · Alexander Smola · Anshumali Shrivastava -
2021 Poster: Scatterbrain: Unifying Sparse and Low-rank Attention »
Beidi Chen · Tri Dao · Eric Winsor · Zhao Song · Atri Rudra · Christopher Ré -
2021 Poster: Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures »
Zhaozhuo Xu · Zhao Song · Anshumali Shrivastava -
2021 Poster: Practical Near Neighbor Search via Group Testing »
Joshua Engels · Benjamin Coleman · Anshumali Shrivastava -
2021 Poster: Locality Sensitive Teaching »
Zhaozhuo Xu · Beidi Chen · Chaojian Li · Weiyang Liu · Le Song · Yingyan Lin · Anshumali Shrivastava -
2021 Poster: Raw Nav-merge Seismic Data to Subsurface Properties with MLP based Multi-Modal Information Unscrambler »
Aditya Desai · Zhaozhuo Xu · Menal Gupta · Anu Chandran · Antoine Vial-Aussavy · Anshumali Shrivastava -
2020 Poster: Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web »
Zhenwei Dai · Anshumali Shrivastava -
2020 Session: Orals & Spotlights Track 03: Language/Audio Applications »
Anshumali Shrivastava · Dilek Hakkani-Tur -
2019 : Posters and Coffee »
Sameer Kumar · Tomasz Kornuta · Oleg Bakhteev · Hui Guan · Xiaomeng Dong · Minsik Cho · Sören Laue · Theodoros Vasiloudis · Andreea Anghel · Erik Wijmans · Zeyuan Shang · Oleksii Kuchaiev · Ji Lin · Susan Zhang · Ligeng Zhu · Beidi Chen · Vinu Joseph · Jialin Ding · Jonathan Raiman · Ahnjae Shin · Vithursan Thangarasa · Anush Sankaran · Akhil Mathur · Martino Dazzi · Markus Löning · Darryl Ho · Emanuel Zgraggen · Supun Nakandala · Tomasz Kornuta · Rita Kuznetsova -
2019 Poster: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products »
Tharun Kumar Reddy Medini · Qixuan Huang · Yiqiu Wang · Vijai Mohan · Anshumali Shrivastava -
2018 Poster: Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements »
Ankush Mandal · He Jiang · Anshumali Shrivastava · Vivek Sarkar -
2017 : Poster Session (encompasses coffee break) »
Beidi Chen · Borja Balle · Daniel Lee · iuri frosio · Jitendra Malik · Jan Kautz · Ke Li · Masashi Sugiyama · Miguel A. Carreira-Perpinan · Ramin Raziperchikolaei · Theja Tulabandhula · Yung-Kyun Noh · Adams Wei Yu -
2017 : Contributed talk: Unique Entity Estimation with Application to the Syrian Conflict »
Beidi Chen -
2016 Poster: Simple and Efficient Weighted Minwise Hashing »
Anshumali Shrivastava -
2014 Poster: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2014 Oral: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2013 Poster: Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search »
Anshumali Shrivastava · Ping Li -
2011 Poster: Hashing Algorithms for Large-Scale Learning »
Ping Li · Anshumali Shrivastava · Joshua L Moore · Arnd C König