Internal Value Functions: Leveraging Hidden States for Efficient Test-Time Scaling in Large Reasoning Models
Abstract
Large Reasoning Models (LRMs) generate extensive hidden states during inference, which encode rich information about the input context and probabilistically influence future token predictions. We propose Internal Value Functions (IVF), a novel approach that leverages these hidden states to approximate state-value functions, effectively predicting how likely a partial reasoning trajectory will converge to the correct answer without additional inference steps. Unlike traditional Process Reward Models (PRMs) that require separate model evaluations, our method enables efficient implementation of several test-time scaling techniques by extracting predictive signals from intermediate representations computed during the forward pass. Experimental results on challenging reasoning benchmarks demonstrate that IVF achieves comparable or better performance than external PRMs while significantly reducing computational overhead.