NeurIPS 2024

Workshop

Sat 15:45

A Framework for Evaluating LLMs Under Task Indeterminacy
Luke Guerdan · Hanna Wallach · Solon Barocas · Alexandra Chouldechova

Workshop

Sat 15:45

A shared standard for valid measurement of generative AI systems' capabilities, risks, and impacts
Alexandra Chouldechova · Chad Atalla · Solon Barocas · A. Feder Cooper · Emily Corvi · Alex Dow · Jean Garcia-Gathright · Nicholas Pangakis · Stefanie Reed · Emily Sheng · Dan Vann · Matthew Vogel · Hannah Washington · Hanna Wallach

Workshop

A Framework for Evaluating LLMs Under Task Indeterminacy
Luke Guerdan · Hanna Wallach · Solon Barocas · Alexandra Chouldechova

Workshop

Sat 15:45

Evaluating Generative AI Systems is a Social Science Measurement Challenge
Hanna Wallach · Meera Desai · Nicholas Pangakis · A. Feder Cooper · Angelina Wang · Solon Barocas · Alexandra Chouldechova · Chad Atalla · Su Lin Blodgett · Emily Corvi · Alex Dow · Jean Garcia-Gathright · Alexandra Olteanu · Stefanie Reed · Emily Sheng · Dan Vann · Jennifer Wortman Vaughan · Matthew Vogel · Hannah Washington · Abigail Jacobs

Workshop

Evaluating Generative AI Systems is a Social Science Measurement Challenge
Hanna Wallach · Meera Desai · Nicholas Pangakis · A. Feder Cooper · Angelina Wang · Solon Barocas · Alexandra Chouldechova · Chad Atalla · Su Lin Blodgett · Emily Corvi · Alex Dow · Jean Garcia-Gathright · Alexandra Olteanu · Stefanie Reed · Emily Sheng · Dan Vann · Jennifer Wortman Vaughan · Matthew Vogel · Hannah Washington · Abigail Jacobs

Workshop

Flood Prediction in Kenya - Leveraging Pre-Trained Models to Generate More Validation Data in a Sparse Observation Settings
Alim Karimi · David Quispe · Hammed Akande · Nicole Mongare · Valerie Brosnan · Asbina Baral

Workshop

Multimodal Auto Validation For Self-Refinement in Web Agents
Ruhana Azam · Tamer Abuelsaad · Aditya Vempaty · Ashish Jagmohan

Workshop

Generating and Validating Agent and Environment Code for Simulating Realistic Personality Profiles with Large Language Models
Nathan Cloos · M Ganesh Kumar · Adam Manoogian · Christopher Cueva · Shawn Rhoads

Workshop

Sun 14:00

Invited talk: Valid scientific inference with neural density estimators and generative models
Ann Lee

Competition

Sun 10:45

Compilation and Validation of the Weather Event Dataset
Aleksandra Gruca

Workshop

Sat 12:00

Statistically Valid Information Bottleneck via Multiple Hypothesis Testing
Amirmohammad Farzaneh · Osvaldo Simeone

Workshop

Sat 15:45

Estimating and Correcting for Misclassification Error in Empirical Textual Research
Jonathan Choi

Main Navigation

28 Results