Video
Chat is not available.
Successful Page Load
Agentic AI systems - LLM-driven agents capable of autonomous planning, tool use, and multi-step task execution - are rapidly advancing, yet methods for evaluating them remain underdeveloped. Traditional metrics for static or single-turn tasks fail to capture the complexity of open-ended, long-horizon interactions where goals evolve and behaviors emerge dynamically. This social aims to bridge research and industry perspectives on designing frameworks, simulation environments, and metrics that assess reliability, alignment, and safety in autonomous agents. Through lightning talks, panel discussions, and networking, the event fosters an interactive exchange on how to meaningfully evaluate and benchmark the next generation of agentic AI systems.
| NeurIPS uses cookies for essential functions only. We do not sell your personal information. Our Privacy Policy » |