Poster
|
Thu 11:00
|
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering
Ruosen Li · Ruochen Li · Barry Wang · Xinya Du
|
|
Affinity Event
|
|
Evaluating Multilingual Dense Embedding Models and a Sparse Model for Information Retrieval in Yoruba: A Comparative Study
Adejumobi Joshua · Anthony Soronnadi · Olubayo Adekanmbi
|
|
Workshop
|
|
Critical Evaluation of Time Series Foundation Models in Demand Forecasting
Santosh Puvvada · Satyajit Chaudhuri
|
|
Workshop
|
Sat 10:25
|
Contributed talk: Evaluating and Mitigating Discrimination in Language Model Decisions
Alex Tamkin
|
|
Workshop
|
Sat 14:45
|
Contributed talk: Evaluating Gender Bias Transfer between Pre-trained and Prompt Adapted Language Models
Natalie Mackraz
|
|
Workshop
|
|
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
ZEYU WANG
|
|
Workshop
|
|
AIR-Bench 2024: Safety Evaluation Based on Risk Categories from Regulations and Policies
Kevin Klyman
|
|
Affinity Event
|
|
Evaluating the Geometric Consistency of Text-to-3D generated models using Surface Normal Analysis
Samridha Murali · Aswath Muthuselvam
|
|
Poster
|
Thu 11:00
|
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
Yijia Shao · Tianshi Li · Weiyan Shi · Yanchen Liu · Diyi Yang
|
|
Poster
|
Fri 16:30
|
Efficient Lifelong Model Evaluation in an Era of Rapid Progress
Ameya Prabhu · Vishaal Udandarao · Philip Torr · Matthias Bethge · Adel Bibi · Samuel Albanie
|
|
Poster
|
Wed 11:00
|
DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNA
Aman Patel · Arpita Singhal · Austin Wang · Anusri Pampari · Maya Kasowski · Anshul Kundaje
|
|
Workshop
|
|
On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning
Soheila Samiee · Anton Thielmann
|
|