Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

67 Results

<<   <   Page 5 of 6   >   >>
Workshop
Hysteresis Activation Function for Efficient Inference
Moshe Kimhi · Idan Kashani · Chaim Baskin · Avi Mendelson
Poster
Thu 16:30 ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction
Renze Chen · Zhuofeng Wang · Beiquan Cao · Tong Wu · Size Zheng · Xiuhong Li · Xuechao Wei · Shengen Yan · Meng Li · Yun Liang
Workshop
A Systematic Evaluation of Decoding-Free Generative Candidate Selection Methods
Mingyu Derek Ma · Yanna Ding · Zijie Huang · Jianxi Gao · Yizhou Sun · Wei Wang
Workshop
Distributed Speculative Inference of Large Language Models is Provably Faster
Nadav Timor · Jonathan Mamou · Oren Pereg · Moshe Berchansky · Daniel Korat · Moshe Wasserblat · Tomer Galanti · Michal Gordon (Kiwkowitz) · David Harel
Workshop
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Costin-Andrei Oncescu · Sanket Purandare · Stratos Idreos · Sham Kakade
Poster
Wed 16:30 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Yang Yue · Yulin Wang · Bingyi Kang · Yizeng Han · Shenzhi Wang · Shiji Song · Jiashi Feng · Gao Huang
Workshop
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Nikhil Bhendawade · Mahyar Najibi · Irina Belousova · Qichen Fu · Henry Mason · Mohammad Rastegari
Workshop
S2D: Sorted Speculative Decoding For More Efficient Deployment of Large Language Models
Parsa Kavehzadeh · Mohammadreza Pourreza · Mojtaba Valipour · Tianshu Zhu · Haoli Bai · Ali Ghodsi · Boxing Chen · Mehdi Rezaghoizadeh
Workshop
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
Wenhao Zheng · Yixiao Chen · Weitong Zhang · Souvik Kundu · Yun Li · Zhengzhong Liu · Eric Xing · Hongyi Wang · Huaxiu Yao
Poster
Nimbus: Secure and Efficient Two-Party Inference for Transformers
Zhengyi Li · Kang Yang · Jin Tan · Wen-jie Lu · Haoqi Wu · Xiao Wang · Yu Yu · Derun Zhao · Yancheng Zheng · Minyi Guo · Jingwen Leng
Workshop
Enabling Resource-Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines
Lei Gao · Amir Ziashahabi · Yue Niu · Salman Avestimehr · Murali Annavaram
Workshop
RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference
Lianming HUANG · Shangyu Wu · Yufei Cui · Ying Xiong · Xue (Steve) Liu · Tei-Wei Kuo · Nan Guan · Chun Jason XUE