Skip to yearly menu bar Skip to main content

Workshop: Mathematics of Modern Machine Learning (M3L)

Theoretical Explanation for Generalization from Adversarial Perturbations

Soichiro Kumano · Hiroshi Kera · Toshihiko Yamasaki


It is not fully understood why adversarial examples can deceive neural networks and transfer between different networks. To elucidate this, several studies hypothesized that adversarial perturbations contain data features that are imperceptible to humans but still recognizable by neural networks. Empirical evidence has shown that neural networks trained on mislabeled samples with these perturbations can generalize to natural test data. However, a theoretical understanding of this counterintuitive phenomenon is limited. In this study, assuming orthogonal training samples, we first prove that one-hidden-layer neural networks can learn natural data structures from adversarial perturbations. Our results indicate that, under mild conditions, the decision boundary from learning perturbations aligns with that from natural data, except for specific points in the input space.

Chat is not available.