Timezone: »
The state-of-the-art approximate nearest neighbor search (ANNS) algorithms face a fundamental tradeoff between query latency and accuracy, because of small main memory capacity: To store indices in main memory for short query latency, the ANNS algorithms have to limit dataset size or use a quantization scheme which hurts search accuracy. The emergence of heterogeneous memory (HM) brings a solution to significantly increase memory capacity and break the above tradeoff: Using HM, billions of data points can be placed in the main memory on a single machine without using any data compression. However, HM consists of both fast (but small) memory and slow (but large) memory, and using HM inappropriately slows down query significantly. In this work, we present a novel graph-based similarity search algorithm called HM-ANN, which takes both memory and data heterogeneity into consideration and enables billion-scale similarity search on a single node without using compression. On two billion-sized datasets BIGANN and DEEP1B, HM-ANN outperforms state-of-the-art compression-based solutions such as L&C and IMI+OPQ in recall-vs-latency by a large margin, obtaining 46% higher recall under the same search latency. We also extend existing graph-based methods such as HNSW and NSG with two strong baseline implementations on HM. At billion-point scale, HM-ANN is 2X and 5.8X faster than our HNSWand NSG baselines respectively to reach the same accuracy.
Author Information
Jie Ren (University of California, Merced)
Minjia Zhang (Microsoft)
Dong Li (University of California, Merced)
More from the Same Authors
-
2023 : DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies »
Shuaiwen Song · Bonnie Kruft · Minjia Zhang · Conglong Li · Shiyang Chen · Chengming Zhang · Masahiro Tanaka · Xiaoxia Wu · Mohammed AlQuraishi · Gustaf Ahdritz · Christina Floristean · Rick Stevens · Venkatram Vishwanath · Arvind Ramanathan · Sam Foreman · Kyle Hippe · Prasanna Balaprakash · Yuxiong He -
2023 : Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs »
Suyu Ge · Yunan Zhang · Liyuan Liu · Minjia Zhang · Jiawei Han · Jianfeng Gao -
2023 : Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs »
Suyu Ge · Yunan Zhang · Liyuan Liu · Minjia Zhang · Jiawei Han · Jianfeng Gao -
2023 : DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing »
Conglong Li · Zhewei Yao · Xiaoxia Wu · Minjia Zhang · Connor Holmes · Cheng Li · Yuxiong He -
2023 : Interactive Panel Discussion »
Nazneen Rajani · Tanya Roosta · Tim Dettmers · Minjia Zhang -
2022 Spotlight: ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers »
Zhewei Yao · Reza Yazdani Aminabadi · Minjia Zhang · Xiaoxia Wu · Conglong Li · Yuxiong He -
2022 Spotlight: Lightning Talks 5B-2 »
Conglong Li · Mohammad Azizmalayeri · Mojan Javaheripi · Pratik Vaishnavi · Jon Hasselgren · Hao Lu · Kevin Eykholt · Arshia Soltani Moakhar · Wenze Liu · Gustavo de Rosa · Nikolai Hofmann · Minjia Zhang · Zixuan Ye · Jacob Munkberg · Amir Rahmati · Arman Zarei · Subhabrata Mukherjee · Yuxiong He · Shital Shah · Reihaneh Zohrabi · Hongtao Fu · Tomasz Religa · Yuliang Liu · Mohammad Manzuri · Mohammad Hossein Rohban · Zhiguo Cao · Caio Cesar Teodoro Mendes · Sebastien Bubeck · Farinaz Koushanfar · Debadeepta Dey -
2022 Spotlight: The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models »
Conglong Li · Minjia Zhang · Yuxiong He -
2022 Panel: Panel 2B-4: Extreme Compression for… & Exploring Length Generalization… »
Cem Anil · Minjia Zhang -
2022 Poster: ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers »
Zhewei Yao · Reza Yazdani Aminabadi · Minjia Zhang · Xiaoxia Wu · Conglong Li · Yuxiong He -
2022 Poster: The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models »
Conglong Li · Minjia Zhang · Yuxiong He -
2022 Poster: XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient »
Xiaoxia Wu · Zhewei Yao · Minjia Zhang · Conglong Li · Yuxiong He -
2021 Poster: NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM »
Connor Holmes · Minjia Zhang · Yuxiong He · Bo Wu -
2020 Poster: Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping »
Minjia Zhang · Yuxiong He -
2020 Poster: AdaTune: Adaptive Tensor Program Compilation Made Efficient »
Menghao Li · Minjia Zhang · Chi Wang · Mingqin Li -
2018 Poster: Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models »
Minjia Zhang · Wenhan Wang · Xiaodong Liu · Jianfeng Gao · Yuxiong He