Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

39 Results

<<   <   Page 3 of 4   >   >>
Workshop
Benchmark to Audit LLM Generated Clinical Notes for Disparities Arising from Biases and Stereotypes
Hongyu Cai · Swetasudha Panda · Naveen Jafer Nizar · Qinlan Shen · Daeja Oxendine · Sumana Srivatsa · Krishnaram Kenthapadi
Workshop
Sat 17:27 Benchmark to Audit LLM Generated Clinical Notes for Disparities Arising from Biases and Stereotypes
Hongyu Cai · Swetasudha Panda · Naveen Jafer Nizar · Qinlan Shen · Daeja Oxendine · Sumana Srivatsa · Krishnaram Kenthapadi
Poster
Thu 11:00 IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation
Fan Lin · Shuyi Xie · Yong Dai · Wenlin Yao · TianJiao Lang · Yu Zhang
Workshop
Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Unlearning
Zora Che · Stephen Casper · Anirudh Satheesh · Rohit Gandikota · Domenic Rosati · Stewart Slocum · Lev McKinney · Zichu Wu · Zikui Cai · Bilal Chughtai · Furong Huang · Dylan Hadfield-Menell
Workshop
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
Jiangjie Chen · Siyu Yuan · Rong Ye · Bodhisattwa Prasad Majumder · Kyle Richardson
Workshop
Evaluating Explanations Through LLMs: Beyond Traditional User Studies
Francesco Bombassei De Bona · Gabriele Dominici · Tim Miller · Marc Langheinrich · Martin Gjoreski
Poster
Thu 16:30 AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
Edoardo Debenedetti · Jie Zhang · Mislav Balunovic · Luca Beurer-Kellner · Marc Fischer · Florian Tramer
Workshop
Sat 12:00 Black-box Uncertainty Quantification Method for LLM-as-a-Judge
Nico Wagner · Michael Desmond · Rahul Nair · Zahra Ashktorab · Elizabeth Daly · Qian Pan · Martín Santillán Cooper · J Johnson · Werner Geyer
Workshop
A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning
Anjun Hu · Jiyang Guan · Philip Torr · Francesco Pinto
Workshop
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks
Nathaniel Demchak · Xin Guan · Zekun Wu · Ziyi Xu · Adriano Koshiyama · Emre Kazim
Workshop
Sat 12:00 CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++?
Vaishnavi Bhargava · Rajat Ghosh · Debojyoti Dutta
Poster
Wed 16:30 AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He