Timezone: »

Robust Bloom Filters for Large MultiLabel Classification Tasks
Moustapha M Cisse · Nicolas Usunier · Thierry Artières · Patrick Gallinari

Sat Dec 07 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor #None

This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive application of Bloom filters in MLC is not robust to individual binary classifiers' errors. We then present an approach that exploits a specific feature of real-world datasets when the number of labels is large: many labels (almost) never appear together. Our approch is provably robust, has sublinear training and inference complexity with respect to the number of labels, and compares favorably to state-of-the-art algorithms on two large scale multilabel datasets.

Author Information

Moustapha M Cisse (KAUST)
Nicolas Usunier (Université de Technologie de Compiègne (UTC))
Thierry Artières (LIS - Aix Marseille Université)
Patrick Gallinari (Sorbonne University & Criteo AI Lab, Paris)

More from the Same Authors