Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

378 Results

<<   <   Page 3 of 32   >   >>
Poster
Wed 16:30 UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models
Yihua Zhang · Chongyu Fan · Yimeng Zhang · Yuguang Yao · Jinghan Jia · Jiancheng Liu · Gaoyuan Zhang · Gaowen Liu · Ramana Kompella · Xiaoming Liu · Sijia Liu
Poster
Wed 16:30 AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Ma Chang · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He
Poster
Wed 11:00 DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNA
Aman Patel · Arpita Singhal · Austin Wang · Anusri Pampari · Maya Kasowski · Anshul Kundaje
Poster
Fri 16:30 Paloma: A Benchmark for Evaluating Language Model Fit
Ian Magnusson · Akshita Bhagia · Valentin Hofmann · Luca Soldaini · Ananya Harsh Jha · Oyvind Tafjord · Dustin Schwenk · Evan Walsh · Yanai Elazar · Kyle Lo · Dirk Groeneveld · Iz Beltagy · Hanna Hajishirzi · Noah Smith · Kyle Richardson · Jesse Dodge
Poster
Thu 11:00 Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency
Yiran Liu · Ke Yang · Zehan Qi · Xiao Liu · Yang Yu · Cheng Xiang Zhai
Poster
Thu 11:00 WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Yufang Hou · Alessandra Pascale · Javier Carnerero-Cano · Tigran Tchrakian · Radu Marinescu · Elizabeth Daly · Inkit Padhi · Prasanna Sattigeri
Affinity Event
Ontology Extraction and Evaluation for the Blue Amazon
Vivian Magri Alcaldi Soares · Renata Wassermann
Poster
Wed 11:00 GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps
Muhammad Umair Nasir · Steven James · Julian Togelius
Poster
Fri 11:00 InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
Linyi Li · Shijie Geng · Zhenwen Li · Yibo He · Hao Yu · Ziyue Hua · Guanghan Ning · Siwei Wang · Tao Xie · Hongxia Yang
Poster
Fri 11:00 Evaluating language models as risk scores
André F. Cruz · Moritz Hardt · Celestine Mendler-Dünner
Poster
Wed 16:30 Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models
Lai Wei · Zhiquan Tan · Chenghai Li · Jindong Wang · Weiran Huang
Affinity Event
Evaluating Generative AI for Scenario Variation in Automated Driving Validation
Manasa Mariam Mammen · Zafer Kayatas · Eva Zimmermann · Pavel Nedvědický