NeurIPS 2024

Skip to yearly menu bar Skip to main content

6 Results

Workshop	Sat 15:45	Auto-Evaluation with Few Labels through Post-hoc Regression Benjamin Eyre · David Madras
Workshop		Multimodal Auto Validation For Self-Refinement in Web Agents Ruhana Azam · Tamer Abuelsaad · Aditya Vempaty · Ashish Jagmohan
Workshop		Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang
Workshop		Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y
Workshop		Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y
Workshop		Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y