Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

169 Results

<<   <   Page 14 of 15   >   >>
Workshop
Contextual evaluation of Large Language Models for Classifying Tropical and Infectious Diseases
Mercy Asiedu · Nenad Tomasev · Chintan Ghate · Tiya Tiyasirichokchai · Awa Dieng · Oluwatosin Akande · Geoffrey Siwo · Steve Adudans · Sylvanus Aitkins · Odianosen Ehiakhamen · Eric Ndombi · Katherine Heller
Workshop
Evaluating Chemistry Prompts for Large-Language Model Fine-Tuning
Carmelo Gonzales · Michael Pieler · Kevin Maik Jablonka · Santiago Miret
Poster
Wed 16:30 UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models
Yihua Zhang · Chongyu Fan · Yimeng Zhang · Yuguang Yao · Jinghan Jia · Jiancheng Liu · Gaoyuan Zhang · Gaowen Liu · Ramana Kompella · Xiaoming Liu · Sijia Liu
Poster
Fri 16:30 Paloma: A Benchmark for Evaluating Language Model Fit
Ian Magnusson · Akshita Bhagia · Valentin Hofmann · Luca Soldaini · Ananya Harsh Jha · Oyvind Tafjord · Dustin Schwenk · Evan Walsh · Yanai Elazar · Kyle Lo · Dirk Groeneveld · Iz Beltagy · Hanna Hajishirzi · Noah Smith · Kyle Richardson · Jesse Dodge
Workshop
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias Benchmarks
Clara Higuera-Cabañes · Ryo Iwaki · Beñat San Sebastian · ROSARIO UCEDA-SOSA · Manish Nagireddy · Hiroshi Kanayama · Mikio Takeuchi · Gakuto Kurata · Karthikeyan Natesan Ramamurthy
Workshop
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models
Mengfei Liang · Archish Arun · Zekun Wu · CRISTIAN VILLALOBOS · Jonathan Lutch · Emre Kazim · Adriano Koshiyama · Philip Treleaven
Poster
Thu 16:30 VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
M. Maruf · Arka Daw · Kazi Sajeed Mehrab · Harish Babu Manogaran · Abhilash Neog · Medha Sawhney · Mridul Khurana · James Balhoff · Yasin Bakis · Bahadir Altintas · Matthew Thompson · Elizabeth Campolongo · Josef Uyeda · Hilmar Lapp · Henry Bart · Paula Mabee · Yu Su · Wei-Lun (Harry) Chao · Charles Stewart · Tanya Berger-Wolf · Wasila Dahdul · Anuj Karpatne
Workshop
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images
Sami Baral · Li Lucy · Ryan Knight · Alice Ng · Luca Soldaini · Neil Heffernan · Kyle Lo
Workshop
Had enough of experts? Elicitation and evaluation of Bayesian priors from large language models
David Antony Selby · Kai Spriestersbach · Yuichiro Iwashita · Dennis Bappert · Archana Warrier · Sumantrak Mukherjee · Muhammad Asim · Koichi Kise · Sebastian Vollmer
Poster
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
Nikhil Khandekar · Qiao Jin · Guangzhi Xiong · Soren Dunn · Serina Applebaum · Zain Anwar · Maame Sarfo-Gyamfi · Conrad Safranek · Abid Anwar · Andrew Zhang · Aidan Gilson · Maxwell Singer · Amisha Dave · Anrew Taylor · Aidong Zhang · Qingyu Chen · Zhiyong Lu
Oral
Fri 15:30 MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
Nikhil Khandekar · Qiao Jin · Guangzhi Xiong · Soren Dunn · Serina Applebaum · Zain Anwar · Maame Sarfo-Gyamfi · Conrad Safranek · Abid Anwar · Andrew Zhang · Aidan Gilson · Maxwell Singer · Amisha Dave · Anrew Taylor · Aidong Zhang · Qingyu Chen · Zhiyong Lu
Poster
Wed 16:30 MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs
Zhongshen Zeng · Yinhong Liu · Yingjia Wan · Jingyao Li · Pengguang Chen · Jianbo Dai · Yuxuan Yao · Rongwu Xu · Zehan Qi · Wanru Zhao · Linling Shen · Jianqiao Lu · Haochen Tan · Yukang Chen · Hao Zhang · Zhan Shi · Bailin Wang · Zhijiang Guo · Jiaya Jia