Spotlight
in
Workshop: Vision Language Models: Challenges of Real World Deployment Sun, Nov 30, 2025 • 3:40 PM – 4:10 PM PST

Closed-Task Validation: A More Robust and Efficient Proxy for Guiding VLM Training

Enci Zhang · Zongqiang ZHANG · Jiahao Xie · Ruiqi Lu · Boyan Zhou · Cheng Yang

Project Page [ Slides] [ OpenReview]

Abstract

Reliable and efficient validation is critical for guiding the resource-intensive process of training Vision-Language Models (VLMs). The standard evaluation paradigm, however, which relies on open-ended text generation, exhibits significant methodological limitations. We empirically demonstrate that this approach is unreliable, yielding high-variance metrics with a negligible correlation (r = 0.061) to final model performance. Furthermore, it is inefficient, as auto-regressive decoding introduces substantial latency and severe load-balancing issues in parallel evaluation. To address these limitations, we propose "Closed-Task" validation, a paradigm that bypasses auto-regressive decoding by converting questions into a multiple-choice format and directly inspecting token probabilities. Our experiments show this method is both highly reliable, producing stable signals strongly correlated (r = 0.798) with final performance, and efficient, achieving a >10x latency reduction with near-perfect load balancing. This work thus provides a robust and efficient validation methodology that resolves the interconnected challenges of evaluation reliability and system efficiency, offering a superior empirical framework for VLM development.

Video

Chat is not available.