Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

38 Results

<<   <   Page 1 of 4   >   >>
Workshop
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents
Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y
Poster
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou · Shikun Zhang · Wei Ye
Workshop
Sat 12:00 Towards Optimal Statistical Watermarking
Baihe Huang · Hanlin Zhu · Banghua Zhu · Kannan Ramchandran · Michael Jordan · Jason Lee · Jiantao Jiao
Poster
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao · Xiaoyuan Yi · Xing Xie
Workshop
Sat 12:00 A Statistical Approach to Quantifying LLM Human Alignment
Harbin Hong · Liu Leqi · Sebastian Caldas
Poster
Thu 11:00 ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence
Kevin Wu · Eric Wu · James Zou
Workshop
Towards Optimizing SQL Generation via LLM Routing
Mohammadhossein Malekpour · Nour Shaheen · Foutse Khomh · Amine Mhedhbi
Workshop
Sat 12:00 Distribution-based sensitivity analysis for large language models
Paulius Rauba · Qiyao Wei · Mihaela van der Schaar
Poster
Wed 11:00 DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios
Junchao Wu · Runzhe Zhan · Derek Wong · Shu Yang · Xinyi Yang · Yulin Yuan · Lidia Chao
Workshop
Sat 12:00 Skilling laws: scaling laws for LLM benchmark performance
Felipe Maia Polo · Seamus Somerstep · Leshem Choshen · Yuekai Sun · Mikhail Yurochkin
Workshop
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng · Tianyu Pang · Chao Du · Qian Liu · Jing Jiang · Min Lin
Workshop
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs
Saeid Asgari · Aliasghar Khani · Amir Khasahmadi