Timezone: »

Unifying Grokking and Double Descent
Xander Davies · Lauro Langosco · David Krueger

Building a principled understanding of generalization in deep learning requires unifying disparate observations under a single conceptual framework. Previous work has studied grokking, a training dynamic in which a sustained period of near-perfect training performance and near-chance test performance is eventually followed by generalization, as well as the superficially similar double descent. These topics have so far been studied in isolation. We hypothesize that grokking and double descent can be understood as instances of the same learning dynamics within a framework of pattern learning speeds, and that this framework also applies when varying model capacity instead of optimization steps. We confirm some implications of this hypothesis empirically, including demonstrating model-wise grokking.

Author Information

Xander Davies (Harvard University)

Hi, I’m Xander. I’m going into my fourth year at Harvard, where I study computer science. I lead the [Harvard AI Safety Team](haist.ai), and currently do deep learning theory research with David Krueger’s lab at Cambridge University.

Lauro Langosco (University of Cambridge)
David Krueger (University of Cambridge)

More from the Same Authors