Task Completion Agents are Not Ideal Collaborators
Abstract
Large Language Model (LLM) agents are increasingly capable of handling complex tasks autonomously, but current development and evaluation practices remain centered around one-shot task completion. This dominant paradigm fails to account for the inherently iterative and collaborative nature of many real-world problems, where human goals are often underspecified and evolve over time. This position paper argues for a shift in focus: from building and assessing task completion agents to developing \emph{collaborative agents} --- those evaluated not just by the quality of their final outputs, but by how well they engage with and enhance human effort throughout the problem-solving process. To support this shift, we introduce \textbf{collaborative effort scaling}, a framework that captures how an agent's utility grows with increasing user involvement. Through case studies and simulated evaluations, we show that state-of-the-art agents often underperform in multi-turn, real-world scenarios, revealing a missing ingredient in agent design: the ability to sustain engagement and scaffold user understanding. Collaborative effort scaling offers a new lens for diagnosing agent behavior and guiding development toward deeper, more adaptive interaction.