firstbacksecondback
67 Results
Workshop
|
Sat 11:42 |
GEAR: An Efficient Error Reduction Framework for KV Cache Compression in LLM Inference · Qingru Zhang · Souvik Kundu · Geonhwa Jeong · Zaoxing Liu · Tushar Krishna · Tuo Zhao |
|
Workshop
|
Approximate Top-k for Increased Parallelism Oscar Key · Luka Ribar · Alberto Cattaneo · Luke Hudlass-Galley · Douglas Orr |
||
Workshop
|
Fused-Layer CNNs for Memory-Efficient Inference on Microcontrollers Mark Deutel · Frank Hannig · Christopher Mutschler · Jürgen Teich |
||
Workshop
|
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection Yun Zhu · Jia-Chen Gu · Caitlin Sikora · Ho Ko · Yinxiao Liu · Chu-Cheng Lin · Lei Shu · Liangchen Luo · Lei Meng · Bang Liu · Jindong Chen |
||
Poster
|
Fri 16:30 |
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation Wangbo Zhao · Jiasheng Tang · Yizeng Han · Yibing Song · Kai Wang · Gao Huang · Fan Wang · Yang You |
|
Poster
|
Wed 16:30 |
Toward Efficient Inference for Mixture of Experts Haiyang Huang · Newsha Ardalani · Anna Sun · Liu Ke · Shruti Bhosale · Hsien-Hsin Lee · Carole-Jean Wu · Benjamin Lee |
|
Workshop
|
Optimizing the IFMIF-DONES Particle Accelerator with Differentiable Deep Learning Surrogate Models Galo Gallardo · Guillermo Rodriguez Llorente · Lucas Magariños · Rodrigo Morant Navascués · Nikita Kkhvatkin Petrovsky · Roberto Gómez-Espinosa Martín |