Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

169 Results

<<   <   Page 3 of 15   >   >>
Workshop
Sat 15:45 MarkMyWords: Analyzing and Evaluating Language Model Watermarks
Julien Piet · Chawin Sitawarin · Vivian Fang · Norman Mu · David Wagner
Workshop
Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate
Arjun Karanam · Farnaz Jahanbakhsh · Sanmi Koyejo
Workshop
Motivations for Reframing Large Language Model Benchmarking for Legal Applications
Riya Ranjan · Megan Ma
Workshop
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
Ali Shiraee Kasmaee · Mohammad Khodadad · Mohammad Arshi Saloot · Nick Sherck · Stephen Dokas · Hamidreza Mahyar · Soheila Samiee
Workshop
Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models
Luxi He · Xiangyu Qi · Inyoung Cheong · Prateek Mittal · Danqi Chen · Peter Henderson
Workshop
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
Allison Huang · Carlos Mougan · Yulu Pi
Workshop
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
ZEYU WANG
Poster
Fri 11:00 Adaptive Labeling for Efficient Out-of-distribution Model Evaluation
Daksh Mittal · Yuanzhe Ma · Shalmali Joshi · Hongseok Namkoong
Workshop
Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate
Arjun Karanam · Farnaz Jahanbakhsh · Sanmi Koyejo
Workshop
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Arash Mari Oriyad · Rezaei · Mahdieh Soleymani · Mohammad Hossein Rohban
Poster
Thu 16:30 DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang · Jiaao Chen · Diyi Yang
Poster
Fri 11:00 LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models
Haitao Li · You Chen · Qingyao Ai · Yueyue WU · Ruizhe Zhang · Yiqun LIU