Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)

Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents

Samuel Brown ⋅ Basil Labib ⋅ Codruta Lugoj ⋅ Sai Sasank Y

Keywords: Alignment LLM LLM Agents Benchmarking AI governance NLP AI safety Self-improvement AI evaluations

Project Page [ Poster] [ OpenReview]

Abstract

LLM agents are rapidly improving at benchmarks designed to measure software development and reasoning ability. As the creation and improvement of such systems is itself a software development task, we are interested in this specific subset of ability. We describe the first steps towards a "meta-benchmark", in which a "top-level" agent aims to increase the performance of "reference agents" at tasks taken from a range of other benchmarks in the literature. We show that agents are able to alter other systems to a) better withstand prompt injection, b) fix non-trivial bugs in scaffolds, c) compare the performance of various LLMs at ``operating'' reference agents' scaffolds, and d) make progress towards model-editing tasks such as unlearning. We consider this a first step towards a broad and extensible meta-benchmark, which would allow us to measure AI self-improvement abilities.

Chat is not available.