Flatness-Aware Regularization for Robust Generalization in Deep Neural Networks
Abstract
Understanding the geometry of deep neural network (DNN) loss landscapes is critical for machine learning research because it shapes model generalization. Prior work suggests that flatter minima often generalize better than sharper ones, as seen in small-batch training and methods such as Entropy-SGD. Yet the link between curvature and generalization remains debated, with some arguing that sharp minima can also generalize, though empirical evidence is limited. The prevailing view holds that flatter minima are more strongly associated with robustness, while sharper minima correlate with overfitting. We introduce a flatness-aware regularization approach that explicitly penalizes curvature in the loss surface by adding an efficiently estimated Hessian-squared trace term to the training objective, computed using Hutchinson’s stochastic trace estimator. Experiments on a CIFAR-100 subset with a two-layer MLP show that our method drives convergence toward flatter minima and improves test performance compared to an unregularized baseline. In particular, moderate regularization reduces the Hessian trace by more than an order of magnitude and raises test accuracy by ~1.5%. These results demonstrate that promoting flatter loss landscapes through explicit curvature penalization is an effective and practical strategy for boosting generalization in deep networks.