Timezone: »
Extreme multi-label classification (XMLC) is a task of selecting a small subset of relevant labels from a very large set of possible labels. As such, it is characterized by long-tail labels, i.e., most labels have very few positive instances. With standard performance measures such as precision@k, a classifier can ignore tail labels and still report good performance. However, it is often argued that correct predictions in the tail are more "interesting" or "rewarding," but the community has not yet settled on a metric capturing this intuitive concept. The existing propensity-scored metrics fall short on this goal by confounding the problems of long-tail and missing labels. In this paper, we analyze generalized metrics budgeted "at k" as an alternative solution. To tackle the challenging problem of optimizing these metrics, we formulate it in the \emph{expected test utility} (ETU) framework, which aims at optimizing the expected performance on a given test set. We derive optimal prediction rules and construct their computationally efficient approximations with provable regret guarantees and being robust against model misspecification. Our algorithm, based on block coordinate descent, scales effortlessly to XMLC problems and obtains promising results in terms of long-tail performance.
Author Information
Erik Schultheis (Aalto University)
Marek Wydmuch (Poznan University of Technology)
Wojciech Kotlowski (Poznan University of Technology, Poland)
Rohit Babbar (University of Bath)
Krzysztof Dembczynski (Yahoo Research)
More from the Same Authors
-
2022 Poster: CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification »
Siddhant Kharbanda · Atmadeep Banerjee · Erik Schultheis · Rohit Babbar -
2022 Poster: Regret Bounds for Multilabel Classification in Sparse Label Regimes »
Róbert Busa-Fekete · Heejin Choi · Krzysztof Dembczynski · Claudio Gentile · Henry Reeve · Balazs Szorenyi -
2017 Poster: Random Permutation Online Isotonic Regression »
Wojciech Kotlowski · Wouter Koolen · Alan Malek -
2014 Workshop: High-energy particle physics, machine learning, and the HiggsML data challenge (HEPML) »
Glen Cowan · Balázs Kégl · Kyle Cranmer · Gábor Melis · Tim Salimans · Vladimir Vava Gligorov · Daniel Whiteson · Lester Mackey · Wojciech Kotlowski · Roberto Díaz Morales · Pierre Baldi · Cecile Germain · David Rousseau · Isabelle Guyon · Tianqi Chen -
2011 Poster: An Exact Algorithm for F-Measure Maximization »
Krzysztof Dembczynski · Willem Waegeman · Weiwei Cheng · Eyke Hullermeier