Timezone: »
Poster
The trade-offs of model size in large recommendation models : 100GB to 10MB Criteo-tb DLRM model
Aditya Desai · Anshumali Shrivastava
Embedding tables dominate industrial-scale recommendation model sizes, using up to terabytes of memory. A popular and the largest publicly available machine learning MLPerf benchmark on recommendation data is a Deep Learning Recommendation Model (DLRM) trained on a terabyte of click-through data. It contains 100GB of embedding memory (25+Billion parameters). DLRMs, due to their sheer size and the associated volume of data, face difficulty in training, deploying for inference, and memory bottlenecks due to large embedding tables. This paper analyzes and extensively evaluates a generic parameter-sharing setup (PSS) for compressing DLRM models. We show theoretical upper bounds on the learnable memory requirements for achieving approximations to the embedding table. Our bounds indicate exponentially fewer parameters suffice for a good approximation. To this end, we demonstrate a PSS DLRM reaching 10000$\times$ compression on criteo-tb without losing quality. Such a compression, however, comes with a caveat. It requires 4.5 $\times$ more iterations to achieve the same saturation quality. The paper argues that this tradeoff needs more investigation as it might be significantly favorable. Leveraging the small size of the compressed model, we show a 4.3$\times$ improvement in training latency leading to similar overall training times. Thus, in the tradeoff between the system advantage of a small DLRM model vs. slower convergence, we show that scales are tipped towards having a smaller DLRM model, leading to the same quality, faster inference, easier deployment, and similar training times.
Author Information
Aditya Desai (Rice University)
Anshumali Shrivastava (Rice University / ThirdAI Corp.)
More from the Same Authors
-
2021 Spotlight: Practical Near Neighbor Search via Group Testing »
Joshua Engels · Benjamin Coleman · Anshumali Shrivastava -
2021 : PISTACHIO: Patch Importance Sampling To Accelerate CNNs via a Hash Index Optimizer »
Zhaozhuo Xu · Anshumali Shrivastava -
2022 : Adaptive Sparse Federated Learning in Large Output Spaces via Hashing »
Zhaozhuo Xu · Luyang Liu · Zheng Xu · Anshumali Shrivastava -
2023 Poster: DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries »
Joshua Engels · Benjamin Coleman · Vihan Lakshman · Anshumali Shrivastava -
2023 Poster: One-Pass Distribution Sketch for Measuring Data Heterogeneity in Federated Learning »
Zichang Liu · Zhaozhuo Xu · Benjamin Coleman · Anshumali Shrivastava -
2023 Poster: Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time »
Zichang Liu · Aditya Desai · Fangshuo Liao · Weitao Wang · Victor Xie · Zhaozhuo Xu · Anastasios Kyrillidis · Anshumali Shrivastava -
2022 Poster: Retaining Knowledge for Learning with Dynamic Definition »
Zichang Liu · Benjamin Coleman · Tianyi Zhang · Anshumali Shrivastava -
2022 Poster: Graph Reordering for Cache-Efficient Near Neighbor Search »
Benjamin Coleman · Santiago Segarra · Alexander Smola · Anshumali Shrivastava -
2021 Poster: Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures »
Zhaozhuo Xu · Zhao Song · Anshumali Shrivastava -
2021 Poster: Practical Near Neighbor Search via Group Testing »
Joshua Engels · Benjamin Coleman · Anshumali Shrivastava -
2021 Poster: Locality Sensitive Teaching »
Zhaozhuo Xu · Beidi Chen · Chaojian Li · Weiyang Liu · Le Song · Yingyan Lin · Anshumali Shrivastava -
2021 Poster: Raw Nav-merge Seismic Data to Subsurface Properties with MLP based Multi-Modal Information Unscrambler »
Aditya Desai · Zhaozhuo Xu · Menal Gupta · Anu Chandran · Antoine Vial-Aussavy · Anshumali Shrivastava -
2020 Poster: Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web »
Zhenwei Dai · Anshumali Shrivastava -
2020 Session: Orals & Spotlights Track 03: Language/Audio Applications »
Anshumali Shrivastava · Dilek Hakkani-Tur -
2019 Poster: Fast and Accurate Stochastic Gradient Estimation »
Beidi Chen · Yingchen Xu · Anshumali Shrivastava -
2019 Poster: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products »
Tharun Kumar Reddy Medini · Qixuan Huang · Yiqiu Wang · Vijai Mohan · Anshumali Shrivastava -
2018 Poster: Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements »
Ankush Mandal · He Jiang · Anshumali Shrivastava · Vivek Sarkar -
2016 Poster: Simple and Efficient Weighted Minwise Hashing »
Anshumali Shrivastava -
2014 Poster: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2014 Oral: Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) »
Anshumali Shrivastava · Ping Li -
2013 Poster: Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search »
Anshumali Shrivastava · Ping Li -
2011 Poster: Hashing Algorithms for Large-Scale Learning »
Ping Li · Anshumali Shrivastava · Joshua L Moore · Arnd C König