ARM: Adaptive Reasoning Model
Tinghui Zhu · Jian Xie · yikai zhang · Aili Chen · Kai Zhang · Yu Su · Yanghua Xiao
Abstract
While large reasoning models demonstrate strong performance on complex tasks, they lack the ability to adjust reasoning token usage based on task difficulty. This often leads to the “overthinking” problem—excessive and unnecessary reasoning—which, although potentially mitigated by human intervention to control the token budget, still fundamentally contradicts the goal of achieving fully autonomous AI. In this work, we propose the Adaptive Reasoning Model (ARM), capable of adaptively selecting appropriate reasoning formats based on the task at hand. These formats include three efficient ones—Direct Answer, Short CoT, and Code—as well as a more elaborate format, Long CoT. To train ARM, we introduce Ada-GRPO, an adaptation of Group Relative Policy Optimization (GRPO), which addresses the format collapse issue in traditional GRPO. Ada-GRPO enables ARM to achieve high token efficiency, reducing tokens by an average of $\sim$30\% and up to $\sim$70\%, while maintaining performance comparable to models that rely solely on Long CoT. All resources will be released.
Chat is not available.
Successful Page Load