Reasoning in Motion: How Agents Learn to Think
Meiqi Sun
Abstract
Web agents need a new training paradigm built on trajectories that capture how perception, planning, and action co-evolve. Together, we'll explore how we can scale reasoning through experience using a self-improving engine that expands data by structure. This marks a shift toward scaling intelligence through coherence and grounded reasoning, rather than rote imitation. In the second half, I’ll show how simple synthetic reasoning gyms with verifiable rewards can serve as catalysts for reasoning to surface latent intelligence and drive generalization to open-ended web tasks. We’ll compare SFT and RL, and discuss how balanced training mixtures strengthen robustness and cross-domain transfer.
Successful Page Load