Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

38 Results

<<   <   Page 2 of 4   >   >>
Workshop
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Suhas Hariharan · Zainab Ali Majid · Jaime Raldua Veuthey · Jacob Haimes
Poster
Fri 16:30 STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Shirley Wu · Shiyu Zhao · Michihiro Yasunaga · Kexin Huang · Kaidi Cao · Qian Huang · Vassilis Ioannidis · Karthik Subbian · James Zou · Jure Leskovec
Workshop
LLM-PIRATE: A benchmark for indirect prompt injection attacks in Large Language Models
Anil Ramakrishna · Jimit Majmudar · Rahul Gupta · Devamanyu Hazarika
Workshop
Sat 15:45 Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
Siyuan Wang · Zhuohan Long · Zhihao Fan · Xuanjing Huang · zhongyu wei
Workshop
GTA: A Benchmark for General Tool Agents
Jize Wang · Ma Zerun · Yining Li · Songyang Zhang · Cailian Chen · Kai Chen · Xinyi Le
Workshop
Sat 15:45 MarkMyWords: Analyzing and Evaluating Language Model Watermarks
Julien Piet · Chawin Sitawarin · Vivian Fang · Norman Mu · David Wagner
Poster
Thu 11:00 MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Jinjie Ni · Fuzhao Xue · Xiang Yue · Yuntian Deng · Mahir Shah · Kabir Jain · Graham Neubig · Yang You
Workshop
DafnyBench: A Benchmark for Formal Software Verification
Chloe Loughridge · Qinyi Sun · Seth Ahrenbach · Federico Cassano · Chuyue (Livia) Sun · Ying Sheng · Anish Mudide · Md Rakib Hossain Misu · Nada Amin · Max Tegmark
Poster
Wed 11:00 Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
Sahar Abdelnabi · Amr Gomaa · Sarath Sivaprasad · Lea Schönherr · Mario Fritz
Oral
Wed 16:10 AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He
Poster
Fri 11:00 PrivAuditor: Benchmarking Data Protection Vulnerabilities in LLM Adaptation Techniques
Derui Zhu · Dingfan Chen · Xiongfei Wu · Jiahui Geng · Zhuo Li · Jens Grossklags · Lei Ma
Workshop
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks
Nathaniel Demchak · Xin Guan · Zekun Wu · Ziyi Xu · Adriano Koshiyama · Emre Kazim