Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

39 Results

<<   <   Page 1 of 4   >   >>
Poster
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou · Shikun Zhang · Wei Ye
Affinity Event
Reasoning-Driven Jury System for LLM Evaluation
Ayda Sultan
Affinity Event
LLM Unlearning EKG: Evaluations using Knowledge Graphs
Rushali Mohbe · Samuel Scarpino
Workshop
Not All LLM Reasoners Are Created Equal
Arian Hosseini · Alessandro Sordoni · Daniel Toyama · Aaron Courville · Rishabh Agarwal
Workshop
Sat 15:45 MarkMyWords: Analyzing and Evaluating Language Model Watermarks
Julien Piet · Chawin Sitawarin · Vivian Fang · Norman Mu · David Wagner
Poster
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao · Xiaoyuan Yi · Xing Xie
Oral
Thu 10:20 LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery · Samuel Bowman · Shi Feng
Workshop
Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset
Shankar Kumar Jeyakumar · Alaa Ahmad · Adrian Gabriel
Workshop
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
Allison Huang · Carlos Mougan · Yulu Pi
Poster
Thu 11:00 LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery · Samuel Bowman · Shi Feng
Workshop
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Suhas Hariharan · Zainab Ali Majid · Jaime Raldua Veuthey · Jacob Haimes
Workshop
Analyzing Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark · Govind Pimpale · Arjun Panickssery · Marius Hobbhahn · Jérémy Scheurer