Lark: Biologically Inspired Neuroevolution for Multi-Stakeholder LLM Agents
Abstract
We present Lark, a biologically inspired decision-making framework that couples LLM-driven reasoning with an evolutionary, stakeholder-aware Multi-Agent System (MAS). To address verbosity and stakeholder trade-offs, we integrate four mechanisms: (i) plasticity, which applies concise, context-sensitive adjustments to candidate solutions; (ii) duplication and maturation, which copy high-performing candidates and specialize them into new modules; (iii) ranked-choice stakeholder aggregation using influence-weighted Borda scoring; and (iv) compute awareness via token-based penalties that reward brevity and update an efficiency metric each generation. The system iteratively proposes diverse strategies, applies plasticity tweaks, simulates stakeholder evaluations, aggregates preferences, selects top candidates, and performs duplication/maturation while factoring compute cost into final scores. In a controlled evaluation over 30 rounds comparing 14 systems, Lark Full achieves a mean rank of (2.55) (95\% CI ([2.17, 2.93])) and a mean composite score of (29.4/50) (95\% CI ([26.34, 32.46])), finishing Top-3 in 80\% of rounds while remaining cost competitive with the leading commercial models (\$0.016 per task). Paired Wilcoxon tests confirm that all four mechanisms contribute significantly as ablating duplication/maturation yields the largest deficit ((\Delta)Score (= 3.5), Cohen's (dz = 2.53), (p < 0.001)), followed by plasticity ((\Delta)Score (= 3.4), (dz = 1.86)), ranked-choice voting ((\Delta)Score (= 2.4), (dz = 1.20)), and token penalties ((\Delta)Score (= 2.2), (dz = 1.63)). Rather than a formal Markov Decision Process (MDP) with constrained optimization, Lark is a practical, compute-aware neuroevolutionary loop that scales stakeholder-aligned strategy generation and makes its trade-offs transparent through per-step metrics and final analyses. Our work presents proof-of-concept findings and invites community feedback as we expand toward real-world validation studies.