Timezone: »

Can Network Flatness Explain the Training Speed-Generalisation Connection?
Albert Q. Jiang · Clare Lyle · Lisa Schut · Yarin Gal
Event URL: https://openreview.net/forum?id=Do1pAET8UEI »

Recent work has shown that training speed, as estimated by the sum over training loss, is predictive of generalization performance. From a Bayesian perspective, this metric can be theoretically linked to marginal likelihood in linear models. However, it is unclear why the relationship holds for DNNs and what the underlying mechanisms are. We hypothesise that this relationship holds in DNNs because of network flatness, which causes both fast training speed and good generalization. We also investigated the hypothesis in varying settings and found that it might hold when the variance in the stochastic gradient estimation is moderate, with either logit averaging, or no data transformation at all. This paper specifies the conditions future works should impose when investigating the connecting mechanism.

Author Information

Albert Q. Jiang (University of Cambridge)
Clare Lyle (University of Oxford)
Lisa Schut (University of Oxford)
Yarin Gal (University of Oxford)

More from the Same Authors