Workshop
|
Sat 15:45
|
ReFeR: A Hierarchical Framework of Models as Evaluative and Reasoning Agents
Yaswanth Narsupalli · Abhranil Chandra · Sreevatsa Muppirala · Manish Gupta · Pawan Goyal
|
|
Poster
|
|
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou · Shikun Zhang · Wei Ye
|
|
Affinity Event
|
|
Reasoning-Driven Jury System for LLM Evaluation
Ayda Sultan
|
|
Affinity Event
|
|
LLM Unlearning EKG: Evaluations using Knowledge Graphs
Rushali Mohbe · Samuel Scarpino
|
|
Workshop
|
Sat 12:00
|
A STEP TOWARDS MIXTURE OF GRADER: STATISTICAL ANALYSIS OF EXISTING AUTOMATIC EVALUATION METRICS
Yun Joon Soh · Jishen Zhao
|
|
Poster
|
|
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao · Xiaoyuan Yi · Xing Xie
|
|
Workshop
|
|
Not All LLM Reasoners Are Created Equal
Arian Hosseini · Alessandro Sordoni · Daniel Toyama · Aaron Courville · Rishabh Agarwal
|
|
Workshop
|
Sat 15:45
|
MarkMyWords: Analyzing and Evaluating Language Model Watermarks
Julien Piet · Chawin Sitawarin · Vivian Fang · Norman Mu · David Wagner
|
|
Workshop
|
Sat 15:45
|
Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank Selection
Giorgos Iacovides · Wuyang Zhou · Danilo Mandic
|
|
Workshop
|
|
Multimodal Auto Validation For Self-Refinement in Web Agents
Ruhana Azam · Tamer Abuelsaad · Aditya Vempaty · Ashish Jagmohan
|
|
Oral
|
Thu 10:20
|
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery · Samuel Bowman · Shi Feng
|
|
Workshop
|
|
Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset
Shankar Kumar Jeyakumar · Alaa Ahmad · Adrian Gabriel
|
|