firstbacksecondback
38 Results
Workshop
|
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique Suhas Hariharan · Zainab Ali Majid · Jaime Raldua Veuthey · Jacob Haimes |
||
Poster
|
Fri 16:30 |
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases Shirley Wu · Shiyu Zhao · Michihiro Yasunaga · Kexin Huang · Kaidi Cao · Qian Huang · Vassilis Ioannidis · Karthik Subbian · James Zou · Jure Leskovec |
|
Workshop
|
LLM-PIRATE: A benchmark for indirect prompt injection attacks in Large Language Models Anil Ramakrishna · Jimit Majmudar · Rahul Gupta · Devamanyu Hazarika |
||
Workshop
|
Sat 15:45 |
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation Siyuan Wang · Zhuohan Long · Zhihao Fan · Xuanjing Huang · zhongyu wei |
|
Workshop
|
GTA: A Benchmark for General Tool Agents Jize Wang · Ma Zerun · Yining Li · Songyang Zhang · Cailian Chen · Kai Chen · Xinyi Le |
||
Workshop
|
Sat 15:45 |
MarkMyWords: Analyzing and Evaluating Language Model Watermarks Julien Piet · Chawin Sitawarin · Vivian Fang · Norman Mu · David Wagner |
|
Poster
|
Thu 11:00 |
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures Jinjie Ni · Fuzhao Xue · Xiang Yue · Yuntian Deng · Mahir Shah · Kabir Jain · Graham Neubig · Yang You |
|
Workshop
|
DafnyBench: A Benchmark for Formal Software Verification Chloe Loughridge · Qinyi Sun · Seth Ahrenbach · Federico Cassano · Chuyue (Livia) Sun · Ying Sheng · Anish Mudide · Md Rakib Hossain Misu · Nada Amin · Max Tegmark |
||
Poster
|
Wed 11:00 |
Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation Sahar Abdelnabi · Amr Gomaa · Sarath Sivaprasad · Lea Schönherr · Mario Fritz |
|
Oral
|
Wed 16:10 |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He |
|
Poster
|
Fri 11:00 |
PrivAuditor: Benchmarking Data Protection Vulnerabilities in LLM Adaptation Techniques Derui Zhu · Dingfan Chen · Xiongfei Wu · Jiahui Geng · Zhuo Li · Jens Grossklags · Lei Ma |
|
Workshop
|
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks Nathaniel Demchak · Xin Guan · Zekun Wu · Ziyi Xu · Adriano Koshiyama · Emre Kazim |