Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

34 Results

<<   <   Page 2 of 3   >   >>
Poster
Wed 11:00 BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
Matteo Bettini · Amanda Prorok · Vincent MOENS
Workshop
RefactorBench: Evaluating Stateful Reasoning In Language Agents Through Code
Dhruv Gautam · Spandan Garg · Jinu Jang · Neel Sundaresan · Roshanak Zilouchian Moghaddam
Poster
Fri 16:30 On the Effects of Data Scale on UI Control Agents
Wei Li · William Bishop · Alice Li · Christopher Rawles · Folawiyo Campbell-Ajala · Divya Tyamagundlu · Oriana Riva
Poster
Wed 16:30 AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He
Oral
Wed 16:10 AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He
Poster
Fri 11:00 GTA: A Benchmark for General Tool Agents
Jize Wang · Ma Zerun · Yining Li · Songyang Zhang · Cailian Chen · Kai Chen · Xinyi Le
Poster
Thu 16:30 RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Chengquan Guo · Xun Liu · Chulin Xie · Andy Zhou · Yi Zeng · Zinan Lin · Dawn Song · Bo Li
Workshop
GTA: A Benchmark for General Tool Agents
Jize Wang · Ma Zerun · Yining Li · Songyang Zhang · Cailian Chen · Kai Chen · Xinyi Le
Workshop
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng · Zhiyuan Huang · Zhenghai Xue · Xinrun Wang · Bo An · Shuicheng Yan
Workshop
CRAB: Cross-platfrom agent benchmark for multi-modal embodied language model agents
Tianqi Xu · Linyao Chen · Dai-Jie Wu · Yanjun Chen · Zecheng Zhang · Xiang Yao · Zhiqiang Xie · Yongchao Chen · Shilong Liu · Bochen Qian · Philip Torr · Bernard Ghanem · Guohao Li
Poster
Thu 16:30 AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
Edoardo Debenedetti · Jie Zhang · Mislav Balunovic · Luca Beurer-Kellner · Marc Fischer · Florian Tramer
Workshop
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli · Mat Allen · Roman Hauksson · Grace Sodunke · Suhas Hariharan · Carlson Cheng · Wenjie Li · Joshua Clymer · Arjun Yadav