Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

67 Results

<<   <   Page 2 of 6   >   >>
Workshop
Dynamic Vocabulary Pruning in Early-Exit LLMs
· Karim Abdel Sadek · Joan Velja · Matteo Nulli · Metod Jazbec
Workshop
Sat 11:48 An Evolved Universal Transformer Memory
Edoardo Cetin · Qi Sun · Tianyu Zhao · Yujin Tang
Workshop
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models
Jonathan Mamou · Oren Pereg · Daniel Korat · Moshe Berchansky · Nadav Timor · Moshe Wasserblat · Roy Schwartz
Workshop
Efficient Modeling of Irregular Time-Series with Stochastic Optimal Control
Byoungwoo Park · Hyungi Lee · Juho Lee
Workshop
Beyond Token Generation: Adaptive Chunk-Distilled Language Modeling
Yanhong Li · Karen Livescu · Jiawei Zhou
Workshop
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Joao Monteiro · Etienne Marcotte · Pierre-Andre Noel · Valentina Zantedeschi · David Vazquez · Nicolas Chapados · Christopher Pal · Perouz Taslakian
Workshop
Post Training Quantization of Large Language Models with Microscaling Formats
Sayeh Sharify · Utkarsh Saxena · Zifei Xu · Wanzin Yazar · Ilya Soloveychik · Xin Wang
Workshop
A Unified Framework for Speculative Decoding with Multiple Drafters as a Bandit
Taehyeon Kim · Hojung Jung · Se-Young Yun
Workshop
Speculative Diffusion Decoding for Accelerated Language Generation
Jacob K Christopher · Brian Bartoldson · Tal Ben-Nun · Michael Cardei · Bhavya Kailkhura · Nando Fioretto
Poster
Fri 11:00 Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
Rohan Baskar Prabhakar · Hengrui Zhang · David Wentzlaff
Workshop
How Redundant Is the Transformer Stack in Speech Representation Models?
Albert Kjøller Jacobsen · Teresa Scheidt · Lenka Hýlová · Lars Kai Hansen
Poster
Wed 11:00 KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
Tianyi Zhang · Jonah Yi · Zhaozhuo Xu · Anshumali Shrivastava