Timezone: »

The Role of Benchmarks in the Scientific Progress of Machine Learning
Lora Aroyo · Samuel Bowman · Isabelle Guyon · Joaquin Vanschoren

Wed Dec 08 07:00 AM -- 08:00 AM (PST) @

Benchmark datasets have played a crucial role in driving empirical progress in machine learning, leading to an interesting dynamic between those on a quest for state-of-the-art performance and those creating new challenging benchmarks. In this panel, we reflect on how benchmarks can lead to scientific progress, both in terms of new algorithmic innovations and improved scientific understanding. First, what qualities of a machine learning system should a good benchmark dataset seek to measure? How well can benchmarks assess performance in dynamic and novel environments, or in tasks with an open-ended set of acceptable answers? Benchmarks can also raise significant ethical concerns including poor data collection practices, under- and misrepresentation of subjects, as well as misspecification of objectives. Second, even given high-quality, carefully constructed benchmarks, which research questions can we hope to answer from leaderboard-climbing, and which ones are deprioritized or impossible to answer due to the limitations of the benchmark paradigm? In general, we hope to deepen the community’s awareness of the important role of benchmarks for advancing the science of machine learning.

Author Information

Lora Aroyo (Google Research)
Samuel Bowman (New York University)
Isabelle Guyon (U. Paris-Saclay & ChaLearn)
Joaquin Vanschoren (Eindhoven University of Technology)

More from the Same Authors