Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

38 Results

<<   <   Page 3 of 4   >   >>
Workshop
Benchmark to Audit LLM Generated Clinical Notes for Disparities Arising from Biases and Stereotypes
Hongyu Cai · Swetasudha Panda · Naveen Jafer Nizar · Qinlan Shen · Daeja Oxendine · Sumana Srivatsa · Krishnaram Kenthapadi
Workshop
Sat 17:27 Benchmark to Audit LLM Generated Clinical Notes for Disparities Arising from Biases and Stereotypes
Hongyu Cai · Swetasudha Panda · Naveen Jafer Nizar · Qinlan Shen · Daeja Oxendine · Sumana Srivatsa · Krishnaram Kenthapadi
Poster
Wed 16:30 AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He
Workshop
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli · Mat Allen · Roman Hauksson · Grace Sodunke · Suhas Hariharan · Carlson Cheng · Wenjie Li · Joshua Clymer · Arjun Yadav
Workshop
Sat 15:45 Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks
Rachel Longjohn · Giri Gopalan · Emily Casleton
Poster
Thu 16:30 AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
Edoardo Debenedetti · Jie Zhang · Mislav Balunovic · Luca Beurer-Kellner · Marc Fischer · Florian Tramer
Workshop
Statistical Bias in Bias Benchmark Design
Hannah Powers · Ioana Baldini · Dennis Wei · Kristin P Bennett
Workshop
MaCBench: A multimodal chemistry and materials science benchmark
Nawaf Alampara · Indrajeet Mandal · Pranav Khetarpal · Hargun Grover · Mara Schilling-Wilhelmi · N M Anoop Krishnan · Kevin Maik Jablonka
Poster
Thu 16:30 Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization
Mucong Ding · Chenghao Deng · Jocelyn Choo · Zichu Wu · Aakriti Agrawal · Avi Schwarzschild · Tianyi Zhou · Tom Goldstein · John Langford · Animashree Anandkumar · Furong Huang
Poster
Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)
Jakob Hauser · Dániel Kondor · Jenny Reddish · Majid Benam · Enrico Cioni · Federica Villa · James Bennett · Daniel Hoyer · Pieter Francois · Peter Turchin · R. Maria del Rio-Chanona
Workshop
Sat 12:00 CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++?
Vaishnavi Bhargava · Rajat Ghosh · Debojyoti Dutta
Poster
Thu 16:30 EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Sunjun Kweon · Jiyoun Kim · Heeyoung Kwak · Dongchul Cha · Hangyul Yoon · Kwang Kim · Jeewon Yang · Seunghyun Won · Edward Choi