Skip to yearly menu bar Skip to main content


Mexico City Oral

A multiscale analysis of mean-field transformers in the moderate interaction regime

Giuseppe Bruno · Federico Pasqualotto · Andrea Agazzi

Don Alberto 2
Wed 3 Dec 3:30 p.m. PST — 3:50 p.m. PST

Abstract: In this paper, we study the evolution of tokens through the depth of encoder-only transformer models at inference time by modeling them as a system of particles interacting in a mean-field way and studying the corresponding dynamics. More specifically, we consider this problem in the moderate interaction regime, where the number $N$ of tokens is large and the inverse temperature parameter $\beta$ of the model scales together with $N$. In this regime, the dynamics of the system displays a multiscale behavior: a fast phase, where the token empirical measure collapses on a low-dimensional space, an intermediate phase, where the measure further collapses into clusters, and a slow one, where such clusters sequentially merge into a single one. We provide a rigorous characterization of the limiting dynamics in each of these phases and prove convergence in the above mentioned limit, exemplifying our results with some simulations.

Chat is not available.