firstbacksecondback
530 Results
Poster
|
Thu 11:00 |
Mercury: A Code Efficiency Benchmark for Code Large Language Models Mingzhe Du · Anh Tuan Luu · Bin Ji · Qian Liu · See-Kiong Ng |
|
Poster
|
Fri 16:30 |
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases Shirley Wu · Shiyu Zhao · Michihiro Yasunaga · Kexin Huang · Kaidi Cao · Qian Huang · Vassilis Ioannidis · Karthik Subbian · James Zou · Jure Leskovec |
|
Poster
|
Wed 11:00 |
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding Houlun Chen · Xin Wang · Hong Chen · Zeyang Zhang · Wei Feng · Bin Huang · Jia Jia · Wenwu Zhu |
|
Poster
|
Wed 11:00 |
A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data Adrian Remonda · Nicklas Hansen · Ayoub Raji · Nicola Musiu · Marko Bertogna · Eduardo Veas · Xiaolong Wang |
|
Poster
|
Wed 16:30 |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He |
|
Poster
|
Wed 11:00 |
DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNA Aman Patel · Arpita Singhal · Austin Wang · Anusri Pampari · Maya Kasowski · Anshul Kundaje |
|
Poster
|
Empowering and Assessing the Utility of Large Language Models in Crop Science Hang Zhang · Jiawei SUN · Renqi Chen · Wei Liu · Zhonghang Yuan · Xinzhe Zheng · Zhefan Wang · Zhiyuan Yang · Hang Yan · Han-Sen Zhong · Xiqing Wang · Wanli Ouyang · Fan Yang · Nanqing Dong |
||
Affinity Event
|
A Hierarchical Agriculture Benchmark for Multimodal Large Language Models Yutong Zhou · Masahiro Ryo |
||
Poster
|
Thu 16:30 |
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices Anka Reuel-Lamparth · Amelia Hardy · Chandler Smith · Max Lamparth · Malcolm Hardy · Mykel J Kochenderfer |
|
Poster
|
Thu 16:30 |
MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse conditions Felix Fent · Fabian Kuttenreich · Florian Ruch · Farija Rizwin · Stefan Juergens · Lorenz Lechermann · Christian Nissler · Andrea Perl · Ulrich Voll · Min Yan · Markus Lienkamp |
|
Poster
|
Wed 16:30 |
A Systematic Review of NeurIPS Dataset Management Practices Yiwei Wu · Leah Ajmani · Shayne Longpre · Hanlin Li |
|
Poster
|
Fri 16:30 |
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition Edoardo Debenedetti · Javier Rando · Daniel Paleka · Silaghi Florin · Dragos Albastroiu · Niv Cohen · Yuval Lemberg · Reshmi Ghosh · Rui Wen · Ahmed Salem · Giovanni Cherubin · Santiago Zanella-Beguelin · Robin Schmid · Victor Klemm · Takahiro Miki · Chenhao Li · Stefan Kraft · Mario Fritz · Florian Tramer · Sahar Abdelnabi · Lea Schönherr |