Workshop

Mathematics of Modern Machine Learning (M3L)

Bingbin Liu ⋅ Kaifeng Lyu ⋅ Sadhika Malladi ⋅ Samory Kpotufe ⋅ Stefanie Jegelka ⋅ Tengyu Ma ⋅ Zhiyuan Li

Project Page [ OpenReview]

Abstract

This workshop explores theory for understanding and advancing modern ML practices, with a focus on mathematical models for empirical deep learning phenomena.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

8:50 AM

Opening Remarks

Video

9:00 AM

Flat Minima and Generalization: from Matrix Sensing to Neural Networks

Maryam Fazel

Video

9:45 AM

A Theoretical Perspective on Hardness of Sampling and Learning from Samples in High Dimensions

Lenka Zdeborová

Video

10:30 AM

Classifier-Free Guidance is a Predictor-Corrector

Arwen Bradley ⋅ Preetum Nakkiran

Video

Link

10:45 AM

Towards characterizing the value of edge embeddings in Graph Neural Networks

Dhruv Rohatgi ⋅ Tanya Marwah ⋅ Zachary Lipton ⋅ Jianfeng Lu ⋅ Ankur Moitra ⋅ Andrej Risteski

Video

Link

11:00 AM

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

Siyu Chen ⋅ Beining Wu ⋅ Miao Lu ⋅ Zhuoran Yang ⋅ Tianhao Wang

Video

Link

11:15 AM

Poster Session 1

12:15 PM

Lunch Break

1:30 PM

Scaling Deep Learning Optimization: Insights into Efficiency, Preconditioning, and Critical Batch Sizes

Sham Kakade

Video

2:15 PM

Open problems in LLM Theory, DL theory, and the role of theory

Matus Telgarsky

Video

3:00 PM

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

Riccardo Grazzi ⋅ Julien Siems ⋅ Jörg Franke ⋅ Arber Zela ⋅ Frank Hutter ⋅ Massimiliano Pontil

Video

Link

3:15 PM

Understanding Factual Recall in Transformers via Associative Memories

Eshaan Nichani ⋅ Jason Lee ⋅ Alberto Bietti

Video

Link

3:30 PM

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Tianyu Guo ⋅ Druv Pai ⋅ Yu Bai ⋅ Jiantao Jiao ⋅ Michael Jordan ⋅ Song Mei

Video

Link

3:45 PM

Mixture of Parrots: Mixtures of experts improve memorization more than reasoning

Samy Jelassi ⋅ Clara Mohri ⋅ David Brandfonbrener ⋅ Alex Gu ⋅ Nikhil Vyas ⋅ Nikhil Anand ⋅ David Alvarez-Melis ⋅ Yuanzhi Li ⋅ Sham Kakade ⋅ Eran Malach

Video

Link

4:00 PM

Poster Session 2

Does Machine Bring in Extra Bias in Learning? Approximating Discrimination Within Models Quickly

Yijun Bian ⋅ Yujie Luo ⋅ Ping Xu

Link

On the Implicit Relation between Low-Rank Adaptation and Differential Privacy

Saber Malekmohammadi ⋅ Golnoosh Farnadi

Link

Self-Improvement in Language Models: The Sharpening Mechanism

Audrey Huang ⋅ Adam Block ⋅ Dylan J Foster ⋅ Dhruv Rohatgi ⋅ Cyril Zhang ⋅ Max Simchowitz ⋅ Jordan Ash ⋅ Akshay Krishnamurthy

Link

SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network

Tomer Galanti ⋅ Zachary Siegel ⋅ Aparna Gupte ⋅ Tomaso Poggio

Link

Information-Theoretic Generalization Bounds for Batch Reinforcement Learning

Xingtu Liu

Link

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Neil Mallinar ⋅ Daniel Beaglehole ⋅ Libin Zhu ⋅ Adityanarayanan Radhakrishnan ⋅ Parthe Pandit ⋅ Misha Belkin

Link

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

Binghui Li ⋅ Yuanzhi Li

Link

Depth Extrapolation of Decoders Trained on Nested Structures

Emile Richard

Link

Diffusion Model Learns Low-Dimensional Distributions via Subspace Clustering

Peng Wang ⋅ Huijie Zhang ⋅ Zekai Zhang ⋅ Siyi Chen ⋅ Yi Ma ⋅ Qing Qu

Link

Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos

Dayal Singh Kalra ⋅ Tianyu He ⋅ Maissam Barkeshli

Link

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

Juno Kim ⋅ Dimitri Meunier ⋅ Arthur Gretton ⋅ Taiji Suzuki ⋅ Zhu Li

Link

How do students become teachers: A dynamical analysis for two-layer neural networks

Zhenyu Zhu ⋅ Fanghui Liu ⋅ Volkan Cevher

Link

Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection

Aaron Alvarado Kristanto Julistiono ⋅ Davoud Ataee Tarzanagh ⋅ Navid Azizan

Link

Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks

Ouns El Harzli ⋅ Bernardo Grau

Link

Convergence of Distributed Adaptive Optimization with Local Updates

Ziheng Cheng ⋅ Margalit Glasgow

Link

Progressive distillation induces an implicit curriculum

Abhishek Panigrahi ⋅ Bingbin Liu ⋅ Sadhika Malladi ⋅ Andrej Risteski ⋅ Surbhi Goel

Link

Comparing Implicit and Denoising Score-Matching Objectives

Artem Artemev ⋅ Ayan Das ⋅ Farhang Nabiei ⋅ Alberto Bernacchia

Link

Understanding Diffusion-based Representation Learning via Low-Dimensional Modeling

Xiao Li ⋅ Zekai Zhang ⋅ Xiang Li ⋅ Siyi Chen ⋅ Zhihui Zhu ⋅ Peng Wang ⋅ Qing Qu

Link

Benign Overfitting in Single-Head Attention

Roey Magen ⋅ Shuning Shang ⋅ Zhiwei Xu ⋅ Spencer Frei ⋅ Wei Hu ⋅ Gal Vardi

Link

The GAN is dead; long live the GAN! A Modern GAN Baseline

Nick Huang ⋅ Aaron Gokaslan ⋅ Volodymyr Kuleshov ⋅ James Tompkin

Link

Information-Theoretic Foundations for Neural Scaling Laws

Hong Jun Jeon ⋅ Benjamin Van Roy

Link

Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training

Anchit Jain ⋅ Rozhin Nobahari ⋅ Aristide Baratin ⋅ Stefano Sarao Mannelli

Link

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Will Merrill ⋅ Ashish Sabharwal

Link

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Frederik Kunstner ⋅ Robin Yadav ⋅ Alan Milligan ⋅ Mark Schmidt ⋅ Alberto Bietti

Link

Provable weak-to-strong generalization via benign overfitting

David Wu ⋅ Anant Sahai

Link

On Your Mark, Get Set, Warmup!

Dayal Singh Kalra ⋅ Maissam Barkeshli

Link

Continuous-Time Analysis of Adaptive Optimization and Normalization

Rhys Gould ⋅ Hidenori Tanaka

Link

Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning

Alexey Rukhovich ⋅ Alexander Podolskiy ⋅ Irina Piontkovskaya

Link

Transformers are Efficient Compilers, Provably

Xiyu Zhai ⋅ Runlong Zhou ⋅ Liao Zhang ⋅ Simon Du

Link

Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study

Pengfei He ⋅ Yingqian Cui ⋅ Han Xu ⋅ Hui Liu ⋅ Makoto Yamada ⋅ Jiliang Tang ⋅ Yue XING

Link

Towards characterizing the value of edge embeddings in Graph Neural Networks

Dhruv Rohatgi ⋅ Tanya Marwah ⋅ Zachary Lipton ⋅ Jianfeng Lu ⋅ Ankur Moitra ⋅ Andrej Risteski

Link

Optimizing Fine-Tuning Efficiency: Gradient Subspace Tracking on Grassmann Manifolds for Large Language Models

Sahar Rajabi ⋅ Sirisha Rambhatla

Link

Benign Overfitting in Out-of-Distribution Generalization of Linear Models

Shange Tang ⋅ Jiayun Wu ⋅ Jianqing Fan ⋅ Chi Jin

Link

Dynamics of Concept Learning and Compositional Generalization

Yongyi Yang ⋅ Core Francisco Park ⋅ Ekdeep S Lubana ⋅ Maya Okawa ⋅ Wei Hu ⋅ Hidenori Tanaka

Link

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Jonas Hübotter ⋅ Sascha Bongni ⋅ Ido Hakimi ⋅ Andreas Krause

Link

Declarative characterizations of direct preference alignment algorithms

Kyle Richardson ⋅ Vivek Srikumar ⋅ Ashish Sabharwal

Link

Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets

Yuandong Tian

Link

Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks

Ouns El Harzli ⋅ Bernardo Grau

Link

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Clémentine Dominé ⋅ Nicolas Anguita ⋅ Alexandra Proca ⋅ Lukas Braun ⋅ Daniel Kunin ⋅ Pedro A.M Mediano ⋅ Andrew Saxe

Link

Geometric Deep Learning with Quasiconformal Neural Networks: An Introduction

Nico Alvarado ⋅ Hans Lobel

Link

Sample compression unleashed : New generalization bounds for real valued losses

Mathieu Bazinet ⋅ Valentina Zantedeschi ⋅ Pascal Germain

Link

Increasing Fairness via Combination with Learning Guarantees

Yijun Bian ⋅ Kun Zhang

Link

Simple and Effective Masked Diffusion Language Models

Subham Sahoo ⋅ Marianne Arriola ⋅ Aaron Gokaslan ⋅ Yair Schiff ⋅ Edgar Marroquin ⋅ Justin Chiu ⋅ Alexander Rush ⋅ Volodymyr Kuleshov

Link

Convergence Properties of Hyperbolic Neural Networks on Riemannian Manifolds

Nico Alvarado ⋅ Sebastian Burgos

Link

Understanding Factual Recall in Transformers via Associative Memories

Eshaan Nichani ⋅ Jason Lee ⋅ Alberto Bietti

Link

Leveraging Intermediate Neural Collapse: Fixing Layers Beyond Effective Depth to Simplex ETFs for Efficient Deep Neural Networks

Emily Liu

Link

A Theory of Initialisation's Impact on Specialisation

Devon Jarvis ⋅ Sebastian Lee ⋅ Clémentine Dominé ⋅ Andrew Saxe ⋅ Stefano Sarao Mannelli

Link

An empirical study of the $(L_0, L_1)$-smoothness condition

Y Cooper

Link

Diffusion Models With Learned Adaptive Noise Processes

Subham Sahoo ⋅ Aaron Gokaslan ⋅ Christopher De Sa ⋅ Volodymyr Kuleshov

Link

Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift

Mitsuhiro Fujikawa ⋅ Youhei Akimoto ⋅ Jun Sakuma ⋅ Kazuto Fukuchi

Link

A Theoretical Framework for Federated Domain Generalization with Gradient Alignment

Mahdiyar Molahasani ⋅ Milad Soltany ⋅ Farhad Pourpanah ⋅ Michael Greenspan ⋅ Ali Etemad

Link

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Tianyu Guo ⋅ Druv Pai ⋅ Yu Bai ⋅ Jiantao Jiao ⋅ Michael Jordan ⋅ Song Mei

Link

In-Context Learning by Linear Attention: Exact Asymptotics and Experiments

Yue Lu ⋅ Mary Letey ⋅ Jacob Zavatone-Veth ⋅ Anindita Maiti ⋅ Cengiz Pehlevan

Link

The Crucial Role of Samplers in Online Direct Preference Optimization

Ruizhe Shi ⋅ Runlong Zhou ⋅ Simon Du

Link

Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization

Matan Schliserman ⋅ Tomer Koren

Link

Misspecified $Q$ -Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

Ally Du ⋅ Lin Yang ⋅ Ruosong Wang

Link

Exploring Task Affinities through NTK Alignment and Early Training Dynamics in Multi-Task Learning

Yoann Morello ⋅ Emilie Grégoire ⋅ Sam Verboven

Link

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Yuda Song ⋅ Hanlin Zhang ⋅ Udaya Ghai ⋅ Carson Eisenach ⋅ Sham Kakade ⋅ Dean Foster

Link

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

Riccardo Grazzi ⋅ Julien Siems ⋅ Jörg Franke ⋅ Arber Zela ⋅ Frank Hutter ⋅ Massimiliano Pontil

Link

Transformers Provably Solve Parity Efficiently with Chain of Thought

Juno Kim ⋅ Taiji Suzuki

Link

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Yibo Jiang ⋅ Goutham Rajendran ⋅ Pradeep Ravikumar ⋅ Bryon Aragam

Link

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

Kairong Luo ⋅ Haodong Wen ⋅ Shengding Hu ⋅ Zhenbo Sun ⋅ Zhiyuan Liu ⋅ Maosong Sun ⋅ Kaifeng Lyu ⋅ Wenguang Chen

Link

Algorithmic Stability of Minimum-Norm Interpolating Deep Neural Networks

Ouns El Harzli ⋅ yoonsoo nam ⋅ Ilja Kuzborskij ⋅ Bernardo Grau ⋅ Ard Louis

Link

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Noam Razin ⋅ Sadhika Malladi ⋅ Adithya Bhaskar ⋅ Danqi Chen ⋅ Sanjeev Arora ⋅ Boris Hanin

Link

Can Bayesian Neural Networks Make Confident Predictions?

Katharine Fisher

Link

Provable unlearning in topic modeling and downstream tasks

Stanley Wei ⋅ Sadhika Malladi ⋅ Sanjeev Arora ⋅ Amartya Sanyal

Link

Implicit Bias of Adam versus Gradient Descent in One-Hidden-Layer Neural Networks

Bhavya Vasudeva ⋅ Vatsal Sharan ⋅ Mahdi Soltanolkotabi

Link

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

Kaiyue Wen ⋅ Huaqing Zhang ⋅ Hongzhou Lin ⋅ Jingzhao Zhang

Link

HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks

Yongyi Yang ⋅ Jiaming Yang ⋅ Wei Hu ⋅ Michal Derezinski

Link

Parameter Symmetry and Emergence of Noise Equilibrium in Stochastic Training

Liu Ziyin ⋅ Mingze Wang ⋅ Hongchao Li ⋅ Lei Wu

Link

Improving the Gaussian Approximation in Neural Networks: Para-Gaussians and Edgeworth Expansions

Mihai Nica ⋅ Janosch Ortmann

Link

Mixture of Parrots: Mixtures of experts improve memorization more than reasoning

Samy Jelassi ⋅ Clara Mohri ⋅ David Brandfonbrener ⋅ Alex Gu ⋅ Nikhil Vyas ⋅ Nikhil Anand ⋅ David Alvarez-Melis ⋅ Yuanzhi Li ⋅ Sham Kakade ⋅ Eran Malach

Link

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

Siyu Chen ⋅ Beining Wu ⋅ Miao Lu ⋅ Zhuoran Yang ⋅ Tianhao Wang

Link

Label Noise: Ignorance Is Bliss

Yilun Zhu ⋅ Jianxin Zhang ⋅ Aditya Gangrade ⋅ Clay Scott

Link

Optimal Protocols for Continual Learning via Statistical Physics and Control Theory

Francesco Mori ⋅ Stefano Sarao Mannelli ⋅ Francesca Mignacco

Link

How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework

Yinuo Ren ⋅ Haoxuan Chen ⋅ Grant Rotskoff ⋅ Lexing Ying