Workshop: Workshop on Machine Learning Safety

Unifying Grokking and Double Descent

Xander Davies · Lauro Langosco · David Krueger


Building a principled understanding of generalization in deep learning requires unifying disparate observations under a single conceptual framework. Previous work has studied grokking, a training dynamic in which a sustained period of near-perfect training performance and near-chance test performance is eventually followed by generalization, as well as the superficially similar double descent. These topics have so far been studied in isolation. We hypothesize that grokking and double descent can be understood as instances of the same learning dynamics within a framework of pattern learning speeds, and that this framework also applies when varying model capacity instead of optimization steps. We confirm some implications of this hypothesis empirically, including demonstrating model-wise grokking.

Chat is not available.