firstbacksecondback
169 Results
Workshop
|
Contextual evaluation of Large Language Models for Classifying Tropical and Infectious Diseases Mercy Asiedu · Nenad Tomasev · Chintan Ghate · Tiya Tiyasirichokchai · Awa Dieng · Oluwatosin Akande · Geoffrey Siwo · Steve Adudans · Sylvanus Aitkins · Odianosen Ehiakhamen · Eric Ndombi · Katherine Heller |
||
Workshop
|
Evaluating Chemistry Prompts for Large-Language Model Fine-Tuning Carmelo Gonzales · Michael Pieler · Kevin Maik Jablonka · Santiago Miret |
||
Poster
|
Wed 16:30 |
UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models Yihua Zhang · Chongyu Fan · Yimeng Zhang · Yuguang Yao · Jinghan Jia · Jiancheng Liu · Gaoyuan Zhang · Gaowen Liu · Ramana Kompella · Xiaoming Liu · Sijia Liu |
|
Poster
|
Fri 16:30 |
Paloma: A Benchmark for Evaluating Language Model Fit Ian Magnusson · Akshita Bhagia · Valentin Hofmann · Luca Soldaini · Ananya Harsh Jha · Oyvind Tafjord · Dustin Schwenk · Evan Walsh · Yanai Elazar · Kyle Lo · Dirk Groeneveld · Iz Beltagy · Hanna Hajishirzi · Noah Smith · Kyle Richardson · Jesse Dodge |
|
Workshop
|
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias Benchmarks Clara Higuera-Cabañes · Ryo Iwaki · Beñat San Sebastian · ROSARIO UCEDA-SOSA · Manish Nagireddy · Hiroshi Kanayama · Mikio Takeuchi · Gakuto Kurata · Karthikeyan Natesan Ramamurthy |
||
Workshop
|
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models Mengfei Liang · Archish Arun · Zekun Wu · CRISTIAN VILLALOBOS · Jonathan Lutch · Emre Kazim · Adriano Koshiyama · Philip Treleaven |
||
Poster
|
Thu 16:30 |
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images M. Maruf · Arka Daw · Kazi Sajeed Mehrab · Harish Babu Manogaran · Abhilash Neog · Medha Sawhney · Mridul Khurana · James Balhoff · Yasin Bakis · Bahadir Altintas · Matthew Thompson · Elizabeth Campolongo · Josef Uyeda · Hilmar Lapp · Henry Bart · Paula Mabee · Yu Su · Wei-Lun (Harry) Chao · Charles Stewart · Tanya Berger-Wolf · Wasila Dahdul · Anuj Karpatne |
|
Workshop
|
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images Sami Baral · Li Lucy · Ryan Knight · Alice Ng · Luca Soldaini · Neil Heffernan · Kyle Lo |
||
Workshop
|
Had enough of experts? Elicitation and evaluation of Bayesian priors from large language models David Antony Selby · Kai Spriestersbach · Yuichiro Iwashita · Dennis Bappert · Archana Warrier · Sumantrak Mukherjee · Muhammad Asim · Koichi Kise · Sebastian Vollmer |
||
Poster
|
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations Nikhil Khandekar · Qiao Jin · Guangzhi Xiong · Soren Dunn · Serina Applebaum · Zain Anwar · Maame Sarfo-Gyamfi · Conrad Safranek · Abid Anwar · Andrew Zhang · Aidan Gilson · Maxwell Singer · Amisha Dave · Anrew Taylor · Aidong Zhang · Qingyu Chen · Zhiyong Lu |
||
Oral
|
Fri 15:30 |
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations Nikhil Khandekar · Qiao Jin · Guangzhi Xiong · Soren Dunn · Serina Applebaum · Zain Anwar · Maame Sarfo-Gyamfi · Conrad Safranek · Abid Anwar · Andrew Zhang · Aidan Gilson · Maxwell Singer · Amisha Dave · Anrew Taylor · Aidong Zhang · Qingyu Chen · Zhiyong Lu |
|
Poster
|
Wed 16:30 |
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs Zhongshen Zeng · Yinhong Liu · Yingjia Wan · Jingyao Li · Pengguang Chen · Jianbo Dai · Yuxuan Yao · Rongwu Xu · Zehan Qi · Wanru Zhao · Linling Shen · Jianqiao Lu · Haochen Tan · Yukang Chen · Hao Zhang · Zhan Shi · Bailin Wang · Zhijiang Guo · Jiaya Jia |