Poster
in
Workshop: The First Workshop on Efficient Reasoning

DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification

Ziyi Wang · Siva Rajesh Kasa · Ankith M S · SANTHOSH KASA · Jiaru Zou · Nan Jiang · Sumit Negi · Ruqi Zhang · Qifan Song

Project Page [ OpenReview]

Abstract

Speculative decoding accelerates LLM inference by letting a small draft model propose multiple tokens that a larger target verifies in parallel, but rigid verification that enforces exact distributional match rejects many plausible tokens and limits speed. We first introduce Static Ensemble, a training‑free fixed‑weight mixture of draft and target that provably traces the Pareto‑optimal trade‑off between rejection probability and distributional bias. To further raise acceptance without sacrificing quality, we propose Diversed (DynamIc VErification RElaxed SpEculative Decoding), which learns context‑dependent mixing weights to form a flexible verification target. This relaxed verification admits safe tokens more often while preserving correctness. Theory and experiments show that Diversed achieves significantly higher inference efficiency than conventional speculative decoding and the static baseline.

Chat is not available.