Skip to yearly menu bar Skip to main content


( events)   Timezone:  
Oral
Tue Dec 10 10:05 AM -- 10:20 AM (PST) @ West Exhibition Hall C + B3
Uniform convergence may be unable to explain generalization in deep learning
Vaishnavh Nagarajan · J. Zico Kolter
Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence. While it is well-known that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot ``explain generalization'' -- even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD, which have test errors less than some small $\epsilon$ in our settings, we show that applying (two-sided) uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than $1-\epsilon$. Through these findings, we cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.