Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

530 Results

<<   <   Page 3 of 45   >   >>
Poster
Thu 11:00 Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mingzhe Du · Anh Tuan Luu · Bin Ji · Qian Liu · See-Kiong Ng
Poster
Fri 16:30 STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Shirley Wu · Shiyu Zhao · Michihiro Yasunaga · Kexin Huang · Kaidi Cao · Qian Huang · Vassilis Ioannidis · Karthik Subbian · James Zou · Jure Leskovec
Poster
Wed 11:00 VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
Houlun Chen · Xin Wang · Hong Chen · Zeyang Zhang · Wei Feng · Bin Huang · Jia Jia · Wenwu Zhu
Poster
Wed 11:00 A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data
Adrian Remonda · Nicklas Hansen · Ayoub Raji · Nicola Musiu · Marko Bertogna · Eduardo Veas · Xiaolong Wang
Poster
Wed 16:30 AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He
Poster
Wed 11:00 DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNA
Aman Patel · Arpita Singhal · Austin Wang · Anusri Pampari · Maya Kasowski · Anshul Kundaje
Poster
Empowering and Assessing the Utility of Large Language Models in Crop Science
Hang Zhang · Jiawei SUN · Renqi Chen · Wei Liu · Zhonghang Yuan · Xinzhe Zheng · Zhefan Wang · Zhiyuan Yang · Hang Yan · Han-Sen Zhong · Xiqing Wang · Wanli Ouyang · Fan Yang · Nanqing Dong
Affinity Event
A Hierarchical Agriculture Benchmark for Multimodal Large Language Models
Yutong Zhou · Masahiro Ryo
Poster
Thu 16:30 BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Anka Reuel-Lamparth · Amelia Hardy · Chandler Smith · Max Lamparth · Malcolm Hardy · Mykel J Kochenderfer
Poster
Thu 16:30 MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse conditions
Felix Fent · Fabian Kuttenreich · Florian Ruch · Farija Rizwin · Stefan Juergens · Lorenz Lechermann · Christian Nissler · Andrea Perl · Ulrich Voll · Min Yan · Markus Lienkamp
Poster
Wed 16:30 A Systematic Review of NeurIPS Dataset Management Practices
Yiwei Wu · Leah Ajmani · Shayne Longpre · Hanlin Li
Poster
Fri 16:30 Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Edoardo Debenedetti · Javier Rando · Daniel Paleka · Silaghi Florin · Dragos Albastroiu · Niv Cohen · Yuval Lemberg · Reshmi Ghosh · Rui Wen · Ahmed Salem · Giovanni Cherubin · Santiago Zanella-Beguelin · Robin Schmid · Victor Klemm · Takahiro Miki · Chenhao Li · Stefan Kraft · Mario Fritz · Florian Tramer · Sahar Abdelnabi · Lea Schönherr