Skip to yearly menu bar Skip to main content


Speculative Streaming: Fast LLM Inference without Auxiliary Models

Nikhil Bhendawade ⋅ Mahyar Najibi ⋅ Irina Belousova ⋅ Qichen Fu ⋅ Henry Mason ⋅ Mohammad Rastegari
[ Slides [ Poster

Abstract

Video

Chat is not available.