Skip to yearly menu bar Skip to main content

Invited Talk
Workshop: CiML 2019: Machine Learning Competitions for All

Frank Hutter (University of Freiburg) "A Proposal for a New Competition Design Emphasizing Scientific Insights"

Frank Hutter


The typical setup in machine learning competitions is to provide one or more datasets and a performance metric, leaving it entirely up to participants which approach to use, how to engineer better features, whether and how to pretrain models on related data, how to tune hyperparameters, how to combine multiple models in an ensemble, etc. The fact that work on each of these components often leads to substantial improvements has several consequences: (1) amongst several skilled teams, the one with the most manpower and engineering drive often wins; (2) it is often unclear why one entry performs better than another one; and (3) scientific insights remain limited.

Based on my experience in both participating in several challenges and also organizing some, I will propose a new competition design that instead emphasizes scientific insight by dividing the various ways in which teams could improve performance into (largely orthogonal) modular components, each of which defines its own competition. E.g., one could run a competition focussing only on effective hyperparameter tuning of a given pipeline (across private datasets). With the same code base and datasets, one could likewise run a competition focussing only on finding better neural architectures, or only better preprocessing methods, or only a better training pipeline, or only better pre-training methods, etc. One could also run multiple of these competitions in parallel, hot-swapping better components found in one competition into the other competitions. I will argue that the result would likely be substantially more valuable in terms of scientific insights than traditional competitions and may even lead to better final performance.

Live content is unavailable. Log in and register to view live content