Metrics Reloaded
Abstract
Flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. A large international expert consortium now created Metrics Reloaded, a comprehensive framework guiding researchers towards problem-aware metric selection. The framework is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects relevant for metric selection, from the domain interest to properties of the target structure(s), data set and algorithm output. It supports image-level classification, object detection, semantic and instance segmentation tasks. Users are guided through the process of selecting and applying appropriate validation metrics while being made aware of pitfalls. To improve the user experience, we implemented the framework in an online tool, which also provides a common point of access to explore metric weaknesses and strengths. An instantiation of the framework for various biomedical image analysis use cases demonstrates its broad applicability across domains.