Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

14 Results

<<   <   Page 1 of 2   >   >>
Workshop
Residual vector quantization for KV cache compression in large language model
Ankur Kumar
Poster
Wed 16:30 MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Akide Liu · Jing Liu · Zizheng Pan · Yefei He · Reza Haffari · Bohan Zhuang
Poster
Wed 11:00 ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
Yefei He · Luoming Zhang · Weijia Wu · Jing Liu · Hong Zhou · Bohan Zhuang
Workshop
A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression
Alessio Devoto · Yu Zhao · Simone Scardapane · Pasquale Minervini
Workshop
Sat 12:00 Scheduling in LLM Inference with Blowed-up Memory Constraints
Zijie Zhou · Jiashuo Jiang
Poster
Wed 11:00 KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
Tianyi Zhang · Jonah Yi · Zhaozhuo Xu · Anshumali Shrivastava
Workshop
LORC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
Rongzhi Zhang · Kuan Wang · Liyuan Liu · Shuohang Wang · Hao Cheng · Chao Zhang · yelong shen
Workshop
SharedContextBench: How Lossy are Long-context Methods in KV Cache Reuse
Yucheng LI · Huiqiang Jiang · Qianhui Wu · Xufang Luo · Surin Ahn · Chengruidong Zhang · Amir Abdi · Dongsheng Li · Jianfeng Gao · Yuqing Yang · Lili Qiu
Workshop
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
Rongzhi Zhang · Kuan Wang · Liyuan Liu · Shuohang Wang · Hao Cheng · Chao Zhang · Yelong Shen
Workshop
LSH-E Tells You What To Discard: An Adaptive Locality-Sensitive Strategy for KV Cache Compression
Tahseen Rabbani · Minghui Liu · Tony O Halloran · Ananth Sankaralingam · Mary-Anne Hartley · Furong Huang
Poster
Thu 11:00 KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Coleman Hooper · Sehoon Kim · Hiva Mohammadzadeh · Michael Mahoney · Sophia Shao · Kurt Keutzer · Amir Gholami
Workshop
CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
Luning Wang · Shiyao Li · Xuefei Ning · Zhihang Yuan · Shengen Yan · Guohao Dai · Yu Wang