firstbacksecondback
34 Results
Poster
|
Wed 11:00 |
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning Matteo Bettini · Amanda Prorok · Vincent MOENS |
|
Workshop
|
RefactorBench: Evaluating Stateful Reasoning In Language Agents Through Code Dhruv Gautam · Spandan Garg · Jinu Jang · Neel Sundaresan · Roshanak Zilouchian Moghaddam |
||
Poster
|
Fri 16:30 |
On the Effects of Data Scale on UI Control Agents Wei Li · William Bishop · Alice Li · Christopher Rawles · Folawiyo Campbell-Ajala · Divya Tyamagundlu · Oriana Riva |
|
Poster
|
Wed 16:30 |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He |
|
Oral
|
Wed 16:10 |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He |
|
Poster
|
Fri 11:00 |
GTA: A Benchmark for General Tool Agents Jize Wang · Ma Zerun · Yining Li · Songyang Zhang · Cailian Chen · Kai Chen · Xinyi Le |
|
Poster
|
Thu 16:30 |
RedCode: Risky Code Execution and Generation Benchmark for Code Agents Chengquan Guo · Xun Liu · Chulin Xie · Andy Zhou · Yi Zeng · Zinan Lin · Dawn Song · Bo Li |
|
Workshop
|
GTA: A Benchmark for General Tool Agents Jize Wang · Ma Zerun · Yining Li · Songyang Zhang · Cailian Chen · Kai Chen · Xinyi Le |
||
Workshop
|
AgentStudio: A Toolkit for Building General Virtual Agents Longtao Zheng · Zhiyuan Huang · Zhenghai Xue · Xinrun Wang · Bo An · Shuicheng Yan |
||
Workshop
|
CRAB: Cross-platfrom agent benchmark for multi-modal embodied language model agents Tianqi Xu · Linyao Chen · Dai-Jie Wu · Yanjun Chen · Zecheng Zhang · Xiang Yao · Zhiqiang Xie · Yongchao Chen · Shilong Liu · Bochen Qian · Philip Torr · Bernard Ghanem · Guohao Li |
||
Poster
|
Thu 16:30 |
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents Edoardo Debenedetti · Jie Zhang · Mislav Balunovic · Luca Beurer-Kellner · Marc Fischer · Florian Tramer |
|
Workshop
|
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents Anthony Costarelli · Mat Allen · Roman Hauksson · Grace Sodunke · Suhas Hariharan · Carlson Cheng · Wenjie Li · Joshua Clymer · Arjun Yadav |