Poster
in
Workshop: The First Workshop on Efficient Reasoning

Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

Jucheng Shen · Yeonju Ro

Project Page [ OpenReview]

Abstract

Masked Diffusion Language Models (MDLM) are becoming competitive with their autoregressive counterparts but commonly decode with fixed steps and sequential unmasking. To accelerate decoding, recent works like Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block/step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs indicated by cosine similarity. Inspired by these two observations, we introduce \textbf{One-Shot Dynamic Thresholding (OSDT)}, which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy–throughput trade-offs (\textbf{+24\%} tokens/s on GSM8K at the \textbf{best} accuracy, \textbf{+45\%} on GPQA with comparable accuracy, and \textbf{+50\%} on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.

Chat is not available.