Skip to yearly menu bar Skip to main content

Workshop: Workshop on Machine Learning Safety

Certifiable Robustness Against Patch Attacks Using an ERM Oracle

Kevin Stangl · Avrim Blum · Omar Montasser · Saba Ahmadi


Consider patch attacks, where at test-time an adversary manipulates a test image with a patch in order to induce a targeted mis-classification. We consider a recent defense to patch attacks, Patch-Cleanser (Xiang et al., 2022). The Patch-Cleanser algorithm requires a prediction model to have a “two-mask correctness” property, meaning that the prediction model should correctly classify any image whenany two blank masks replace portions of the image. To this end, Xiang et al. (2022) learn a prediction model to be robust to two-mask operations by augmenting the training set by adding pairs of masks at random locations of training images, and performing empirical risk minimization (ERM) on the augmented dataset. However, in the non-realizable setting when no predictor is perfectly correct on all two-mask operations on all images, we exhibit an example where ERM fails. To overcome this challenge, we propose a different algorithm that provably learns a predictor robust to all two-mask operations using an ERM oracle, based on prior work by Feige et al. (2015a) .

Chat is not available.