Workshop
|
Sat 15:45
|
MarkMyWords: Analyzing and Evaluating Language Model Watermarks
Julien Piet · Chawin Sitawarin · Vivian Fang · Norman Mu · David Wagner
|
|
Workshop
|
|
Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate
Arjun Karanam · Farnaz Jahanbakhsh · Sanmi Koyejo
|
|
Workshop
|
|
Motivations for Reframing Large Language Model Benchmarking for Legal Applications
Riya Ranjan · Megan Ma
|
|
Workshop
|
|
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
Ali Shiraee Kasmaee · Mohammad Khodadad · Mohammad Arshi Saloot · Nick Sherck · Stephen Dokas · Hamidreza Mahyar · Soheila Samiee
|
|
Workshop
|
|
Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models
Luxi He · Xiangyu Qi · Inyoung Cheong · Prateek Mittal · Danqi Chen · Peter Henderson
|
|
Workshop
|
|
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
Allison Huang · Carlos Mougan · Yulu Pi
|
|
Workshop
|
|
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
ZEYU WANG
|
|
Poster
|
Fri 11:00
|
Adaptive Labeling for Efficient Out-of-distribution Model Evaluation
Daksh Mittal · Yuanzhe Ma · Shalmali Joshi · Hongseok Namkoong
|
|
Workshop
|
|
Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate
Arjun Karanam · Farnaz Jahanbakhsh · Sanmi Koyejo
|
|
Workshop
|
|
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Arash Mari Oriyad · Rezaei · Mahdieh Soleymani · Mohammad Hossein Rohban
|
|
Poster
|
Thu 16:30
|
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang · Jiaao Chen · Diyi Yang
|
|
Poster
|
Fri 11:00
|
LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models
Haitao Li · You Chen · Qingyao Ai · Yueyue WU · Ruizhe Zhang · Yiqun LIU
|
|