firstbacksecondback
34 Results
Poster
|
Wed 16:30 |
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents Cheng-Kuang Wu · Zhi Rui Tam · Chieh-Yen Lin · Yun-Nung (Vivian) Chen · Hung-yi Lee |
|
Poster
|
Thu 11:00 |
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Ruisheng Cao · Fangyu Lei · Haoyuan Wu · Jixuan Chen · Yeqiao Fu · Hongcheng Gao · Xinzhuang Xiong · Hanchong Zhang · Wenjing Hu · Yuchen Mao · Tianbao Xie · Hongshen Xu · Danyang Zhang · Sida Wang · Ruoxi Sun · Pengcheng Yin · Caiming Xiong · Ansong Ni · Qian Liu · Victor Zhong · Lu Chen · Kai Yu · Tao Yu |
|
Workshop
|
Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case Peng Chen · Pi Bu · Jun Song · Yuan Gao · Bo Zheng |
||
Workshop
|
SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION Jingxuan Chen · Derek Yuen · Bin Xie · Yuhao Yang · Gongwei Chen · Zhihao Wu · Li Yixing · Xurui Zhou · Weiwen Liu · Shuai Wang · Rui Shao · Liqiang Nie · Yasheng Wang · Jianye Hao · Jun Wang · Kun Shao |
||
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |
||
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |
||
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |
||
Poster
|
Wed 11:00 |
SustainDC: Benchmarking for Sustainable Data Center Control Avisek Naug · Antonio Guillen-Perez · Ricardo Luna Gutierrez · Vineet Gundecha · Cullen Bash · Sahand Ghorbanpour · Sajad Mousavi · Ashwin Ramesh Babu · Dejan Markovikj · Lekhapriya Dheeraj Kashyap · Desik Rengarajan · Soumyendu Sarkar |
|
Poster
|
Wed 16:30 |
RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Scripts Jiaheng Liu · Zehao Ni · Haoran Que · Sun · Noah Wang · Jian Yang · JiakaiWang · Hongcheng Guo · Zhongyuan Peng · Ge Zhang · Jiayi Tian · Xingyuan Bu · Ke Xu · Wenge Rong · Junran Peng · ZHAO-XIANG ZHANG |
|
Poster
|
Fri 11:00 |
DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents Peter Jansen · Marc-Alexandre Côté · Tushar Khot · Erin Bransom · Bhavana Dalvi Mishra · Bodhisattwa Prasad Majumder · Oyvind Tafjord · Peter Clark |