firstbacksecondback
28 Results
Workshop
|
Sat 15:45 |
A Framework for Evaluating LLMs Under Task Indeterminacy Luke Guerdan · Hanna Wallach · Solon Barocas · Alexandra Chouldechova |
|
Workshop
|
Sat 15:45 |
A shared standard for valid measurement of generative AI systems' capabilities, risks, and impacts Alexandra Chouldechova · Chad Atalla · Solon Barocas · A. Feder Cooper · Emily Corvi · Alex Dow · Jean Garcia-Gathright · Nicholas Pangakis · Stefanie Reed · Emily Sheng · Dan Vann · Matthew Vogel · Hannah Washington · Hanna Wallach |
|
Workshop
|
A Framework for Evaluating LLMs Under Task Indeterminacy Luke Guerdan · Hanna Wallach · Solon Barocas · Alexandra Chouldechova |
||
Workshop
|
Sat 15:45 |
Evaluating Generative AI Systems is a Social Science Measurement Challenge Hanna Wallach · Meera Desai · Nicholas Pangakis · A. Feder Cooper · Angelina Wang · Solon Barocas · Alexandra Chouldechova · Chad Atalla · Su Lin Blodgett · Emily Corvi · Alex Dow · Jean Garcia-Gathright · Alexandra Olteanu · Stefanie Reed · Emily Sheng · Dan Vann · Jennifer Wortman Vaughan · Matthew Vogel · Hannah Washington · Abigail Jacobs |
|
Workshop
|
Evaluating Generative AI Systems is a Social Science Measurement Challenge Hanna Wallach · Meera Desai · Nicholas Pangakis · A. Feder Cooper · Angelina Wang · Solon Barocas · Alexandra Chouldechova · Chad Atalla · Su Lin Blodgett · Emily Corvi · Alex Dow · Jean Garcia-Gathright · Alexandra Olteanu · Stefanie Reed · Emily Sheng · Dan Vann · Jennifer Wortman Vaughan · Matthew Vogel · Hannah Washington · Abigail Jacobs |
||
Workshop
|
Flood Prediction in Kenya - Leveraging Pre-Trained Models to Generate More Validation Data in a Sparse Observation Settings Alim Karimi · David Quispe · Hammed Akande · Nicole Mongare · Valerie Brosnan · Asbina Baral |
||
Workshop
|
Multimodal Auto Validation For Self-Refinement in Web Agents Ruhana Azam · Tamer Abuelsaad · Aditya Vempaty · Ashish Jagmohan |
||
Workshop
|
Generating and Validating Agent and Environment Code for Simulating Realistic Personality Profiles with Large Language Models Nathan Cloos · M Ganesh Kumar · Adam Manoogian · Christopher Cueva · Shawn Rhoads |
||
Workshop
|
Sun 14:00 |
Invited talk: Valid scientific inference with neural density estimators and generative models Ann Lee |
|
Competition
|
Sun 10:45 |
Compilation and Validation of the Weather Event Dataset Aleksandra Gruca |
|
Workshop
|
Sat 12:00 |
Statistically Valid Information Bottleneck via Multiple Hypothesis Testing Amirmohammad Farzaneh · Osvaldo Simeone |
|
Workshop
|
Sat 15:45 |
Estimating and Correcting for Misclassification Error in Empirical Textual Research Jonathan Choi |