Spectral Regularization as a Safety-Critical Inductive Bias
Abstract
Deep neural networks exhibit a "spectral bias," a tendency to learn low-frequency functions more easily than high-frequency ones. This creates a critical vulnerability: adversarial attacks, which introduce subtle, high-frequency perturbations to inputs that cause catastrophic model failures. This paper introduces Fourier Gradient Regularization (FGR), a novel, physics-inspired training method that directly addresses this vulnerability. By penalizing the high-frequency components of the model's input-gradients during training, analogous to a coarse-graining procedure in physics, FGR induces a "smoothness" prior, forcing the model to become less sensitive to the very perturbations adversaries exploit. Our empirical results on CIFAR-10 with a ResNet-18 architecture demonstrate that FGR can more than double adversarial robustness under Projected Gradient Descent (PGD) attacks while maintaining near-identical performance on clean data, showcasing a highly favorable accuracy-robustness trade-off.