firstbacksecondback
14 Results
Workshop
|
Residual vector quantization for KV cache compression in large language model Ankur Kumar |
||
Poster
|
Wed 16:30 |
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models Akide Liu · Jing Liu · Zizheng Pan · Yefei He · Reza Haffari · Bohan Zhuang |
|
Poster
|
Wed 11:00 |
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Yefei He · Luoming Zhang · Weijia Wu · Jing Liu · Hong Zhou · Bohan Zhuang |
|
Workshop
|
A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression Alessio Devoto · Yu Zhao · Simone Scardapane · Pasquale Minervini |
||
Workshop
|
Sat 12:00 |
Scheduling in LLM Inference with Blowed-up Memory Constraints Zijie Zhou · Jiashuo Jiang |
|
Poster
|
Wed 11:00 |
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization Tianyi Zhang · Jonah Yi · Zhaozhuo Xu · Anshumali Shrivastava |
|
Workshop
|
LORC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy Rongzhi Zhang · Kuan Wang · Liyuan Liu · Shuohang Wang · Hao Cheng · Chao Zhang · yelong shen |
||
Workshop
|
SharedContextBench: How Lossy are Long-context Methods in KV Cache Reuse Yucheng LI · Huiqiang Jiang · Qianhui Wu · Xufang Luo · Surin Ahn · Chengruidong Zhang · Amir Abdi · Dongsheng Li · Jianfeng Gao · Yuqing Yang · Lili Qiu |
||
Workshop
|
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy Rongzhi Zhang · Kuan Wang · Liyuan Liu · Shuohang Wang · Hao Cheng · Chao Zhang · Yelong Shen |
||
Workshop
|
LSH-E Tells You What To Discard: An Adaptive Locality-Sensitive Strategy for KV Cache Compression Tahseen Rabbani · Minghui Liu · Tony O Halloran · Ananth Sankaralingam · Mary-Anne Hartley · Furong Huang |
||
Poster
|
Thu 11:00 |
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization Coleman Hooper · Sehoon Kim · Hiva Mohammadzadeh · Michael Mahoney · Sophia Shao · Kurt Keutzer · Amir Gholami |
|
Workshop
|
CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios Luning Wang · Shiyao Li · Xuefei Ning · Zhihang Yuan · Shengen Yan · Guohao Dai · Yu Wang |