Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Associative Memory & Hopfield Networks in 2023

Associative Memory Under the Probabilistic Lens: Improved Transformers & Dynamic Memory Creation

Rylan Schaeffer · Mikail Khona · Nika Zahedi · Ila Fiete · Andrey Gromov · Sanmi Koyejo


Abstract:

Clustering is a fundamental unsupervised learning problem, and recent work showed modern continuous associative memory (AM) networks can learn to cluster data via a novel unconstrained continuous relaxation of the discrete clustering optimization problem. In this work, we demonstrate that the energy function of that AM network can be viewed as the scaled negative log likelihood of a Gaussian mixture model, and that the dynamics of the AM network can be viewed as performing expectation maximization via gradient ascent rather than via closed-form coordinate ascent. Based on this insight, we show that a widespread practical implementation choice - self-attention with pre-layer normalization - approximates clustering on the hypersphere with inhomogeneous von Mises-Fisher likelihoods, suggesting a future experiment to improve transformers. We additionally leverage this connection to propose a novel AM network with the ability to create new memories during learning, as necessitated by the data, by drawing on tools from combinatorial stochastic processes and Bayesian nonparametrics.

Chat is not available.