Timezone: »
This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive application of Bloom filters in MLC is not robust to individual binary classifiers' errors. We then present an approach that exploits a specific feature of real-world datasets when the number of labels is large: many labels (almost) never appear together. Our approch is provably robust, has sublinear training and inference complexity with respect to the number of labels, and compares favorably to state-of-the-art algorithms on two large scale multilabel datasets.
Author Information
Moustapha M Cisse (KAUST)
Nicolas Usunier (Université de Technologie de Compiègne (UTC))
Thierry Artières (LIS - Ecole Centrale Marseille / Aix Marseille Université / CNRS)
Patrick Gallinari (Sorbonne Universite, Criteo AI Lab)
More from the Same Authors
-
2020 Poster: Normalizing Kalman Filters for Multivariate Time Series Analysis »
Emmanuel de Bézenac · Syama Sundar Rangapuram · Konstantinos Benidis · Michael Bohlke-Schneider · Richard Kurle · Lorenzo Stella · Hilaf Hasson · Patrick Gallinari · Tim Januschowski -
2015 Workshop: Extreme Classification 2015: Multi-class and Multi-label Learning in Extremely Large Label Spaces »
Manik Varma · Moustapha M Cisse -
2014 Poster: Optimizing F-Measures by Cost-Sensitive Classification »
Shameem Puthiya Parambath · Nicolas Usunier · Yves Grandvalet -
2013 Workshop: Neural Information Processing Scaled for Bioacoustics : NIPS4B »
Hervé GLOTIN · Yann LeCun · Thierry Artières · Stephane Mallat · Ofer Tchernichovski · Xanadu Halkias -
2013 Poster: Translating Embeddings for Modeling Multi-relational Data »
Antoine Bordes · Nicolas Usunier · Alberto Garcia-Duran · Jason Weston · Oksana Yakhnenko -
2012 Poster: On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking »
Clément Calauzènes · Nicolas Usunier · Patrick Gallinari -
2012 Oral: On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking »
Clément Calauzènes · Nicolas Usunier · Patrick Gallinari