firstbacksecondback
49 Results
Workshop
|
Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset Shankar Kumar Jeyakumar · Alaa Ahmad · Adrian Gabriel |
||
Workshop
|
Sat 15:45 |
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation Siyuan Wang · Zhuohan Long · Zhihao Fan · Xuanjing Huang · zhongyu wei |
|
Workshop
|
Adversarial Negotiation Dynamics in Generative Language Models Arinbjörn Kolbeinsson · Benedikt Kolbeinsson |
||
Poster
|
Thu 16:30 |
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph Zhehao Zhang · Jiaao Chen · Diyi Yang |
|
Poster
|
Fri 11:00 |
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective Mingxiang Liao · hannan lu · Qixiang Ye · Wangmeng Zuo · Fang Wan · Tianyu Wang · Yuzhong Zhao · Jingdong Wang · Xinyu Zhang |
|
Affinity Event
|
Reasoning-Driven Jury System for LLM Evaluation Ayda Sultan |
||
Workshop
|
Simple LLM Compression Recovery Using Dynamic Prompting with Theoretical Analysis Duc Hoang · Minsik Cho · Thomas Merth · Mohammad Rastegari · Zhangyang "Atlas" Wang |
||
Poster
|
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types Yutao Mou · Shikun Zhang · Wei Ye |
||
Affinity Event
|
LLM Unlearning EKG: Evaluations using Knowledge Graphs Rushali Mohbe · Samuel Scarpino |
||
Poster
|
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses Jing Yao · Xiaoyuan Yi · Xing Xie |
||
Workshop
|
Sat 15:45 |
MarkMyWords: Analyzing and Evaluating Language Model Watermarks Julien Piet · Chawin Sitawarin · Vivian Fang · Norman Mu · David Wagner |
|
Workshop
|
Not All LLM Reasoners Are Created Equal Arian Hosseini · Alessandro Sordoni · Daniel Toyama · Aaron Courville · Rishabh Agarwal |