Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLM
Feng Hong · Geng Yu · Yushi Ye · haicheng huang · Huangjie Zheng · Ya Zhang · Yanfeng Wang · Jiangchao Yao
Abstract
This paper investigates the severe quality-speed trade-off in existing Diffusion Large Language Models (DLLMs) and attribute this to the irreversibility of standard decoding in DLLMs. To resolve this, we introduce Wide-In, Narrow-Out (WINO), a training-free decoding algorithm that enables revokable decoding in DLLMs. It employs a parallel draft-and-verify mechanism, aggressively drafting multiple tokens while simultaneously using the model’s bidirectional context to verify and re-mask suspicious ones for refinement. Extensive experiments are conducted to characterize and demonstrate the effectiveness of WINO.
Chat is not available.
Successful Page Load