Skip to yearly menu bar Skip to main content

Workshop: Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation

VAIDA: An Educative Benchmark Creation Paradigm using Visual Analytics for Interactively Discouraging Artifacts (by Anjana Arunkumar, Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral and Chris Bryan)

Anjana Arunkumar · Swaroop Mishra · Chitta Baral


We present VAIDA, a novel benchmark creation paradigm (BCP) for NLP. VAIDA provides realtime feedback to crowdworkers about the quality of samples as they are being created, educating them about potential artifacts and allowing them to update samples to remove the same. Concurrently, VAIDA supports backend analysts to review and approve submitted samples for benchmark inclusion, analyze the overall quality of the dataset, and resample splits to obtain and freeze the optimum state. VAIDA is domain, model, task, and metric agnostic, and constitutes a paradigm shift for robust, validated, and dynamic benchmark creation via human-and-metric-in-the-loop workflows. We demonstrate VAIDA's effectiveness by leveraging DQI (a data quality metric) over four datasets. We further evaluate via expert review and a user study with NASA TLX. We find that VAIDA decreases mental demand, temporal demand, effort, and frustration of crowdworkers (29.7%) and analysts(12.1%); it increases performance by 30.8\% and 26\% respectively.