Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

67 Results

<<   <   Page 1 of 6   >   >>
Workshop
Sat 13:48 Inference-Friendly Models With MixAttention
Shashank Rajput · Ying Sheng · Sean Owen · Vitaliy Chiley
Workshop
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao · Yanshuai Cao · Lili Mou
Workshop
Data-Efficient Variational Mutual Information Estimation via Bayesian Self-Consistency
Desi R Ivanova · Marvin Schmitt · Stefan Radev
Workshop
A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression
Alessio Devoto · Yu Zhao · Simone Scardapane · Pasquale Minervini
Poster
Fri 11:00 PrivCirNet: Efficient Private Inference via Block Circulant Transformation
Tianshi Xu · Lemeng Wu · Runsheng Wang · Meng Li
Workshop
Scaling laws for post-training quantized large language models
Zifei Xu · Alexander Lan · Wanzin Yazar · Tristan Webb · Sayeh Sharify · Xin Wang
Workshop
Snakes and Ladders: Accelerating SSM Inference with Speculative Decoding
Yangchao Wu · Yonatan Dukler · Matthew Trager · Alessandro Achille · Wei Xia · Stefano Soatto
Workshop
Skip Transformers: Efficient Inference through Skip-Routing
Matthew Peroni · Dimitris Bertsimas
Workshop
Residual vector quantization for KV cache compression in large language model
Ankur Kumar
Expo Talk Panel
Wed 16:30 Logarithmic Math in accurate and efficient AI inference accelerators
Workshop
Eagle: Efficient Training-Free Router for Multi-LLM Inference
Zesen Zhao · Shuowei Jin · Zhuoqing Morley Mao
Workshop
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models
Jonathan Mamou · Oren Pereg · Daniel Korat · Moshe Berchansky · Nadav Timor · Moshe Wasserblat · Roy Schwartz