firstbacksecondback
205 Results
Workshop
|
Benchmarking Large Language Models as AI Research Agents Qian Huang · Jian Vora · Percy Liang · Jure Leskovec |
||
Workshop
|
LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion Firas Al-Hafez · Davide Tateo · Jan Peters |
||
Workshop
|
PLPilot: Benchmark an Automated Programming Language Design Framework Enabled by Large Language Models Kaiyan Chang · kubn wang · Mengdi Wang · shengwen Liang · Yinhe Han · Huawei Li · Xiaowei Li · ying wang |
||
Workshop
|
Towards a more inductive world for drug repurposing approaches Jesus de la Fuente Cedeño · Guillermo Serrano · Uxia Veleiro · Mikel Casals · Laura Vera · Marija Pizurica · Antonio Pineda-Lucena · Idoia Ochoa · Silve Vicent · Olivier Gevaert · Mikel Hernaez |
||
Workshop
|
WebArena: A Realistic Web Environment for Building Autonomous Agents Shuyan Zhou · Frank F. Xu · Hao Zhu · Xuhui Zhou · Robert Lo · Abishek Sridhar · Xianyi Cheng · Tianyue Ou · Yonatan Bisk · Daniel Fried · Uri Alon · Graham Neubig |
||
Workshop
|
PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses Charles Harris · Kieran Didi · Arian Jamasb · Chaitanya K. Joshi · Simon Mathis · Pietro Lió · Tom Blundell |
||
Workshop
|
Towards a Situational Awareness Benchmark for LLMs Rudolf Laine · Alexander Meinke · Owain Evans |
||
Workshop
|
ARB: Advanced Reasoning Benchmark for Large Language Models Tom Sawada · Daniel Paleka · Alexander Havrilla · Pranav Tadepalli · Paula Vidas · Alexander Kranias · John Nay · Kshitij Gupta · Aran Komatsuzaki |
||
Workshop
|
MUBen: Benchmarking the Uncertainty of Molecular Representation Models Yinghao Li · Yinghao Li · Lingkai Kong · Lingkai Kong · Yuanqi Du · Yuanqi Du · Yue Yu · Yuchen Zhuang · Yuchen Zhuang · Wenhao Mu · Wenhao Mu · Chao Zhang · Chao Zhang |
||
Workshop
|
HomeRobot: Open-Vocabulary Mobile Manipulation Sriram Yenamandra · Arun Ramachandran · Karmesh Yadav · Austin Wang · Mukul Khanna · Theophile Gervet · Tsung-Yen Yang · Vidhi Jain · Alexander Clegg · John Turner · Zsolt Kira · Manolis Savva · Angel Chang · Devendra Singh Chaplot · Dhruv Batra · Roozbeh Mottaghi · Yonatan Bisk · Chris Paxton |
||
Workshop
|
PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design Chuanrui WANG · Bozitao Zhong · Zuobai Zhang · Narendra Chaudhary · Sanchit Misra · Jian Tang |
||
Workshop
|
Haldane Bundles: A Dataset for Learning to Predict the Chern Number of Line Bundles on the Torus Cody Tipton · Elizabeth Coda · Davis Brown · Alyson Bittner · Caitlin Hutten · Grayson Jorgenson · Tegan Emerson · Henry Kvinge |