Workshop
|
Sat 12:00
|
H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models
Nhi Pham · Michael Schott
|
|
Poster
|
Wed 16:30
|
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
Shuo Liu · Kaining Ying · Hao Zhang · yue yang · Yuqi Lin · Tianle Zhang · Chuanhao Li · Yu Qiao · Ping Luo · Wenqi Shao · Kaipeng Zhang
|
|
Poster
|
Fri 11:00
|
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
jingnan zheng · Han Wang · An Zhang · Nguyen Duy Tai · Jun Sun · Tat-Seng Chua
|
|
Workshop
|
|
ASTRID - An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Mohita Chowdhury · Yajie He · Ernest Lim · Aisling Higham
|
|
Workshop
|
|
AIR-Bench 2024: Safety Evaluation Based on Risk Categories from Regulations and Policies
Kevin Klyman
|
|
Workshop
|
|
ReFeR: A Hierarchical Framework of Models as Evaluative and Reasoning Agents
Yaswanth Narsupalli · Abhranil Chandra · Sreevatsa Muppirala · Manish Gupta · Pawan Goyal
|
|
Workshop
|
Sat 15:45
|
ReFeR: A Hierarchical Framework of Models as Evaluative and Reasoning Agents
Yaswanth Narsupalli · Abhranil Chandra · Sreevatsa Muppirala · Manish Gupta · Pawan Goyal
|
|
Workshop
|
|
Examining Distribution-based Amortized Fair Ranking
Aparna Balagopalan · Kai Wang · Asia Biega · Marzyeh Ghassemi
|
|
Workshop
|
|
What's in a Query: Examining Distribution-based Amortized Fair Ranking
Aparna Balagopalan · Kai Wang · Asia Biega · Marzyeh Ghassemi
|
|
Workshop
|
|
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images
Sami Baral · Li Lucy · Ryan Knight · Alice Ng · Luca Soldaini · Neil Heffernan · Kyle Lo
|
|
Workshop
|
|
RelWire: Metric Based Rewiring
Rishi Sonthalia · Anna Gilbert · Matthew Durham
|
|
Poster
|
Wed 11:00
|
Evaluating the design space of diffusion-based generative models
Yuqing Wang · Ye He · Molei Tao
|
|