Skip to yearly menu bar Skip to main content

Workshop: Workshop on Machine Learning Safety

Certified defences hurt generalisation

Piersilvio De Bartolomeis · Jacob Clarysse · Fanny Yang · Amartya Sanyal


In recent years, much work has been devoted to designing certifieddefences for neural networks, i.e., methods for learning neuralnetworks that are provably robust to certain adversarialperturbations. Due to the non-convexity of the problem, dominantapproaches in this area rely on convex approximations, which areinherently loose. In this paper, we question the effectiveness of suchapproaches for realistic computer vision tasks. First, we provideextensive empirical evidence to show that certified defences suffernot only worse accuracy but also worse robustness and fairness thanempirical defences. We hypothesise that the reason for why certifieddefences suffer in generalisation is (i) the large number ofrelaxed non-convex constraints and (ii) strong alignment between theadversarial perturbations and the "signal" direction. We provide acombination of theoretical and experimental evidence to support thesehypotheses.

Chat is not available.