Workshop: Workshop on Machine Learning Safety

Image recognition time for humans predicts adversarial vulnerability for models

David Mayo · Jesse Cummings · Xinyu Lin · Boris Katz · Andrei Barbu


The success of adversarial attacks and the performance tradeoffs made by adversarial defense methods have both traditionally been evaluated on image test sets constructed from a randomly sampled held out portion of a training set. Mayo 2022 et al. [1] measured the difficulty of the ImageNet and ObjectNet test sets by measuring the minimum viewing time required for an object to be recognized on average by a human, finding that these test sets are heavily skewed towards containing mostly easy, quickly recognized images. While difficult images that require longer viewing times to be recognized are uncommon in test sets, they are both common and critically important to the real world performance of vision models. In this work, we investigated the relationship between adversarial robustness and viewing time difficulty. Measuring the AUC of accuracy vs attack strength (epsilon), we find that easy, quickly recognized, images are more robust to adversarial attacks than difficult images, which require several seconds of viewing time to recognize. Additionally, adversarial defense methods improve models robustness to adversarial attacks on easy images significantly more than on hard images. We propose that the distribution of image difficulties should be carefully considered and controlled for when measuring both the effectiveness of adversarial attacks and when analyzing the clean accuracy vs robustness tradeoff made by adversarial defense methods.

Chat is not available.