firstbacksecondback
39 Results
Workshop
|
Benchmark to Audit LLM Generated Clinical Notes for Disparities Arising from Biases and Stereotypes Hongyu Cai · Swetasudha Panda · Naveen Jafer Nizar · Qinlan Shen · Daeja Oxendine · Sumana Srivatsa · Krishnaram Kenthapadi |
||
Workshop
|
Sat 17:27 |
Benchmark to Audit LLM Generated Clinical Notes for Disparities Arising from Biases and Stereotypes Hongyu Cai · Swetasudha Panda · Naveen Jafer Nizar · Qinlan Shen · Daeja Oxendine · Sumana Srivatsa · Krishnaram Kenthapadi |
|
Poster
|
Thu 11:00 |
IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation Fan Lin · Shuyi Xie · Yong Dai · Wenlin Yao · TianJiao Lang · Yu Zhang |
|
Workshop
|
Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Unlearning Zora Che · Stephen Casper · Anirudh Satheesh · Rohit Gandikota · Domenic Rosati · Stewart Slocum · Lev McKinney · Zichu Wu · Zikui Cai · Bilal Chughtai · Furong Huang · Dylan Hadfield-Menell |
||
Workshop
|
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena Jiangjie Chen · Siyu Yuan · Rong Ye · Bodhisattwa Prasad Majumder · Kyle Richardson |
||
Workshop
|
Evaluating Explanations Through LLMs: Beyond Traditional User Studies Francesco Bombassei De Bona · Gabriele Dominici · Tim Miller · Marc Langheinrich · Martin Gjoreski |
||
Poster
|
Thu 16:30 |
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents Edoardo Debenedetti · Jie Zhang · Mislav Balunovic · Luca Beurer-Kellner · Marc Fischer · Florian Tramer |
|
Workshop
|
Sat 12:00 |
Black-box Uncertainty Quantification Method for LLM-as-a-Judge Nico Wagner · Michael Desmond · Rahul Nair · Zahra Ashktorab · Elizabeth Daly · Qian Pan · Martín Santillán Cooper · J Johnson · Werner Geyer |
|
Workshop
|
A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning Anjun Hu · Jiyang Guan · Philip Torr · Francesco Pinto |
||
Workshop
|
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks Nathaniel Demchak · Xin Guan · Zekun Wu · Ziyi Xu · Adriano Koshiyama · Emre Kazim |
||
Workshop
|
Sat 12:00 |
CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++? Vaishnavi Bhargava · Rajat Ghosh · Debojyoti Dutta |
|
Poster
|
Wed 16:30 |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He |