firstbacksecondback
188 Results
Workshop
|
Sat 8:50 |
Self-Evaluation Improves Selective Generation in Large Language Models Jie Ren · Yao Zhao · Tu Vu · Peter Liu · Balaji Lakshminarayanan |
|
Workshop
|
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders David Bruns-Smith · Angela Zhou |
||
Poster
|
Tue 15:15 |
Self-Evaluation Guided Beam Search for Reasoning Yuxi Xie · Kenji Kawaguchi · Yiran Zhao · James Xu Zhao · Min-Yen Kan · Junxian He · Michael Xie |
|
Workshop
|
Re-evaluating Retrosynthesis Algorithms with Syntheseus Krzysztof Maziarz · Austin Tripp · Austin Tripp · Guoqing Liu · Guoqing Liu · Megan J Stanley · Megan J Stanley · Shufang Xie · Shufang Xie · Piotr Gaiński · Piotr Gaiński · Philipp Seidl · Philipp Seidl · Marwin Segler · Marwin Segler |
||
Workshop
|
Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI Elena Sizikova · Niloufar Saharkhiz · Diksha Sharma · Miguel Lago · Berkman Sahiner · Jana Delfino · Aldo Badano |
||
Workshop
|
MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft Haowei Lin · Zihao Wang · Jianzhu Ma · Yitao Liang |
||
Workshop
|
Structure-based and leakage-free data splits for rigorous protein function evaluation Charlotte Rochereau · Mohammed AlQuraishi · Arthur Valentin · Gergo Nikolenyi |
||
Workshop
|
Paper 44: Evaluating ChatGPT-generated Textbook Questions using IRT Shreya Bhandari · Yunting Liu · Zachary Pardos |
||
Workshop
|
SCIBENCH: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models Xiaoxuan Wang · Ziniu Hu · Pan Lu · Yanqiao Zhu · Jieyu Zhang · Satyen Subramaniam · Arjun Loomba · Shichang Zhang · Yizhou Sun · Wei Wang |
||
Workshop
|
Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models Yujin Kim · Jaehong Yoon · Seonghyeon Ye · Sung Ju Hwang · Se-Young Yun |
||
Workshop
|
An International Consortium for AI Risk Evaluations Ross Gruetzemacher · Alan Chan · Štěpán Los · Kevin Frazier · Simeon Campos · Matija Franklin · José Hernández-Orallo · James Fox · Christin Manning · Philip M Tomei · Kyle Kilian |
||
Workshop
|
Evaluating AI-guided Design for Scientific Discovery Michael Pekala · Elizabeth Pogue · Alexander New · Gregory Bassen · Janna Domenico · Tyrel McQueen · Christopher Stiles |