Adaptive Norm Selection Prevents Catastrophic Overfitting in Fast Adversarial Training
Fares Mehouachi · Saif Eddin Jabari
Abstract
We present a novel solution to Catastrophic Overfitting (CO) in fast adversarial training based solely on adaptive $l^p$ norm selection. Unlike existing methods requiring noise injection, regularization, or gradient clipping, our approach dynamically adjusts training norms based on gradient concentration, preventing the vulnerability to multi-step attacks that plagues single-step methods.We begin with the empirical observation that, with small perturbations, CO occurs predominantly under $l^{\infty}$ rather than $l^2$ norms. Building on this observation, we formulate generalized $l^p$ attacks as a fixed-point problem and develop $l^p$-FGSM to analyze the $l^2$-to-$l^{\infty}$ transition. Our key discovery: CO arises when concentrated gradients—with information localized in few dimensions—meet aggressive norm constraints.We quantify gradient concentration via Participation Ratio from quantum mechanics and entropy metrics, yielding an adaptive $l^p$-FGSM that dynamically adjusts the training norm based on gradient structure. Experiments show our method achieves robust performance without auxiliary regularization or noise injection, offering a principled solution to the CO problem.
Chat is not available.
Successful Page Load