Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models
Abstract
Masked Diffusion Language Models (MDLM) are becoming competitive with their autoregressive counterparts but commonly decode with fixed steps and sequential unmasking. To accelerate decoding, recent works like Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block/step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs indicated by cosine similarity. Inspired by these two observations, we introduce \textbf{One-Shot Dynamic Thresholding (OSDT)}, which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy–throughput trade-offs (\textbf{+24\%} tokens/s on GSM8K at the \textbf{best} accuracy, \textbf{+45\%} on GPQA with comparable accuracy, and \textbf{+50\%} on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.