Skip to yearly menu bar Skip to main content

Workshop: Workshop on Machine Learning Safety

Lower Bounds on 0-1 Loss for Multi-class Classification with a Test-time Attacker

Sihui Dai · Wenxin Ding · Arjun Nitin Bhagoji · Daniel Cullina · Prateek Mittal · Ben Zhao


Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model and comparing it to that achieved by state-of-the-art training methods is thus an important diagnostic tool. In this paper, we find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset. We provide a general framework for computing lower bounds on 0-1 loss based on solving a linear program (LP). This LP is constructed based on what we introduce as a conflict hypergraph, and we explore different settings in the construction of this hypergraph and their impact on the computed lower bound. Our work enables, for the first time, an analysis of the gap to optimal robustness for classifiers in the multi-class setting.

Chat is not available.