Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Abstract
Recent advancements in LLM-based agents have demonstrated remarkable capabilities in handling complex, knowledge-intensive tasks by integrating external tools. Among diverse choices of tools, search tools play a pivotal role in accessing vast external knowledge. Reinforcement Learning stands out as a natural choices of learning to use tools. However, existing RL agents still fall short of achieving expert-level Search Intelligence, the ability to resolve ambiguous queries, analyze results, and conduct thorough exploration. Existing approaches fall short in scalability, efficiency, and data quality. For example, small turn limits in existing online RLmethods, e.g. ≤ 10, restrict complex strategy learning. This paper introduces ASearcher, a large-scale RL training project of search agents. Our key contributions include: (1) Scalable fully asynchronous RL training that enables long-horizon search while maintaining high training efficiency. (2) A prompt-based LLM agent that autonomously synthesizes high-quality and challenging QAs, creating a large scale QA dataset. Through RL training, our prompt-based 32B agent achieves substantial improvements, with +22.4 and +15.0 Avg@4 gains on xBench and GAIA, respectively. Notably, our agent exhibits extreme long-horizon search, with tool calls exceeding 100 turns and output tokens exceeding 400k during training. With a simple agent design and no external LLMs, ASearcher-Web-QwQ achieves Avg@4 scores of 51.1 on xBench and 58.1 on GAIA, achieving state-of-the-art results.