firstbacksecondback
67 Results
Workshop
|
Hysteresis Activation Function for Efficient Inference Moshe Kimhi · Idan Kashani · Chaim Baskin · Avi Mendelson |
||
Poster
|
Thu 16:30 |
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction Renze Chen · Zhuofeng Wang · Beiquan Cao · Tong Wu · Size Zheng · Xiuhong Li · Xuechao Wei · Shengen Yan · Meng Li · Yun Liang |
|
Workshop
|
A Systematic Evaluation of Decoding-Free Generative Candidate Selection Methods Mingyu Derek Ma · Yanna Ding · Zijie Huang · Jianxi Gao · Yizhou Sun · Wei Wang |
||
Workshop
|
Distributed Speculative Inference of Large Language Models is Provably Faster Nadav Timor · Jonathan Mamou · Oren Pereg · Moshe Berchansky · Daniel Korat · Moshe Wasserblat · Tomer Galanti · Michal Gordon (Kiwkowitz) · David Harel |
||
Workshop
|
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond Costin-Andrei Oncescu · Sanket Purandare · Stratos Idreos · Sham Kakade |
||
Poster
|
Wed 16:30 |
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution Yang Yue · Yulin Wang · Bingyi Kang · Yizeng Han · Shenzhi Wang · Shiji Song · Jiashi Feng · Gao Huang |
|
Workshop
|
Speculative Streaming: Fast LLM Inference without Auxiliary Models Nikhil Bhendawade · Mahyar Najibi · Irina Belousova · Qichen Fu · Henry Mason · Mohammad Rastegari |
||
Workshop
|
S2D: Sorted Speculative Decoding For More Efficient Deployment of Large Language Models Parsa Kavehzadeh · Mohammadreza Pourreza · Mojtaba Valipour · Tianshu Zhu · Haoli Bai · Ali Ghodsi · Boxing Chen · Mehdi Rezaghoizadeh |
||
Workshop
|
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing Wenhao Zheng · Yixiao Chen · Weitong Zhang · Souvik Kundu · Yun Li · Zhengzhong Liu · Eric Xing · Hongyi Wang · Huaxiu Yao |
||
Poster
|
Nimbus: Secure and Efficient Two-Party Inference for Transformers Zhengyi Li · Kang Yang · Jin Tan · Wen-jie Lu · Haoqi Wu · Xiao Wang · Yu Yu · Derun Zhao · Yancheng Zheng · Minyi Guo · Jingwen Leng |
||
Workshop
|
Enabling Resource-Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines Lei Gao · Amir Ziashahabi · Yue Niu · Salman Avestimehr · Murali Annavaram |
||
Workshop
|
RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference Lianming HUANG · Shangyu Wu · Yufei Cui · Ying Xiong · Xue (Steve) Liu · Tei-Wei Kuo · Nan Guan · Chun Jason XUE |