firstbacksecondback
6 Results
Workshop
|
Sat 15:45 |
Auto-Evaluation with Few Labels through Post-hoc Regression Benjamin Eyre · David Madras |
|
Workshop
|
Multimodal Auto Validation For Self-Refinement in Web Agents Ruhana Azam · Tamer Abuelsaad · Aditya Vempaty · Ashish Jagmohan |
||
Workshop
|
Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang |
||
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |
||
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |
||
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |