NeurIPS Poster Robustness Guarantees for Adversarially Trained Neural Networks

Poster

Robustness Guarantees for Adversarially Trained Neural Networks

Poorya Mianjy · Raman Arora

Great Hall & Hall B1+B2 (level 1) #1919

[ Abstract ]

[ Paper] [ OpenReview]

Abstract: We study robust adversarial training of two-layer neural networks as a bi-level optimization problem. In particular, for the inner loop that implements the adversarial attack during training using projected gradient descent (PGD), we propose maximizing a \emph{lower bound} on the

0 / 1

$0/1$ -loss by reflecting a surrogate loss about the origin. This allows us to give a convergence guarantee for the inner-loop PGD attack. Furthermore, assuming the data is linearly separable, we provide precise iteration complexity results for end-to-end adversarial training, which holds for any width and initialization. We provide empirical evidence to support our theoretical results.

Chat is not available.