firstbacksecondback
67 Results
Workshop
|
Dynamic Vocabulary Pruning in Early-Exit LLMs · Karim Abdel Sadek · Joan Velja · Matteo Nulli · Metod Jazbec |
||
Workshop
|
Sat 11:48 |
An Evolved Universal Transformer Memory Edoardo Cetin · Qi Sun · Tianyu Zhao · Yujin Tang |
|
Workshop
|
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models Jonathan Mamou · Oren Pereg · Daniel Korat · Moshe Berchansky · Nadav Timor · Moshe Wasserblat · Roy Schwartz |
||
Workshop
|
Efficient Modeling of Irregular Time-Series with Stochastic Optimal Control Byoungwoo Park · Hyungi Lee · Juho Lee |
||
Workshop
|
Beyond Token Generation: Adaptive Chunk-Distilled Language Modeling Yanhong Li · Karen Livescu · Jiawei Zhou |
||
Workshop
|
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference Joao Monteiro · Etienne Marcotte · Pierre-Andre Noel · Valentina Zantedeschi · David Vazquez · Nicolas Chapados · Christopher Pal · Perouz Taslakian |
||
Workshop
|
Post Training Quantization of Large Language Models with Microscaling Formats Sayeh Sharify · Utkarsh Saxena · Zifei Xu · Wanzin Yazar · Ilya Soloveychik · Xin Wang |
||
Workshop
|
A Unified Framework for Speculative Decoding with Multiple Drafters as a Bandit Taehyeon Kim · Hojung Jung · Se-Young Yun |
||
Workshop
|
Speculative Diffusion Decoding for Accelerated Language Generation Jacob K Christopher · Brian Bartoldson · Tal Ben-Nun · Michael Cardei · Bhavya Kailkhura · Nando Fioretto |
||
Poster
|
Fri 11:00 |
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference Rohan Baskar Prabhakar · Hengrui Zhang · David Wentzlaff |
|
Workshop
|
How Redundant Is the Transformer Stack in Speech Representation Models? Albert Kjøller Jacobsen · Teresa Scheidt · Lenka Hýlová · Lars Kai Hansen |
||
Poster
|
Wed 11:00 |
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization Tianyi Zhang · Jonah Yi · Zhaozhuo Xu · Anshumali Shrivastava |