Workshop
|
Sat 13:48
|
Inference-Friendly Models With MixAttention
Shashank Rajput · Ying Sheng · Sean Owen · Vitaliy Chiley
|
|
Workshop
|
|
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao · Yanshuai Cao · Lili Mou
|
|
Workshop
|
|
Data-Efficient Variational Mutual Information Estimation via Bayesian Self-Consistency
Desi R Ivanova · Marvin Schmitt · Stefan Radev
|
|
Workshop
|
|
A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression
Alessio Devoto · Yu Zhao · Simone Scardapane · Pasquale Minervini
|
|
Poster
|
Fri 11:00
|
PrivCirNet: Efficient Private Inference via Block Circulant Transformation
Tianshi Xu · Lemeng Wu · Runsheng Wang · Meng Li
|
|
Workshop
|
|
Scaling laws for post-training quantized large language models
Zifei Xu · Alexander Lan · Wanzin Yazar · Tristan Webb · Sayeh Sharify · Xin Wang
|
|
Workshop
|
|
Snakes and Ladders: Accelerating SSM Inference with Speculative Decoding
Yangchao Wu · Yonatan Dukler · Matthew Trager · Alessandro Achille · Wei Xia · Stefano Soatto
|
|
Workshop
|
|
Skip Transformers: Efficient Inference through Skip-Routing
Matthew Peroni · Dimitris Bertsimas
|
|
Workshop
|
|
Residual vector quantization for KV cache compression in large language model
Ankur Kumar
|
|
Expo Talk Panel
|
Wed 16:30
|
Logarithmic Math in accurate and efficient AI inference accelerators
|
|
Workshop
|
|
Eagle: Efficient Training-Free Router for Multi-LLM Inference
Zesen Zhao · Shuowei Jin · Zhuoqing Morley Mao
|
|
Workshop
|
|
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models
Jonathan Mamou · Oren Pereg · Daniel Korat · Moshe Berchansky · Nadav Timor · Moshe Wasserblat · Roy Schwartz
|
|