Skip to yearly menu bar Skip to main content

Poster Session
Workshop: Mathematics of Modern Machine Learning (M3L)

Poster Session


Posters in this session:

A PAC-Bayesian Perspective on the Interpolating Information Criterion

Graph Neural Networks Benefit from Structural Information Provably: A Feature Learning Perspective

Linear attention is (maybe) all you need (to understand transformer optimization)

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study

Feature Learning in Infinite-Depth Neural Networks

Variational Classification

Implicit biases in multitask and continual learningfrom a backward error analysis perspective

Spectrum Extraction and Clipping for Implicitly Linear Layers

The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization

Curvature-Dimension Tradeoff for Generalization in Hyperbolic Space

Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations

Unveiling the Hessian's Connection to the Decision Boundary

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

Large Learning Rates Improve Generalization: But How Large Are We Talking About?

Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization

Generalization Guarantees of Deep ResNets in the Mean-Field Regime

Theoretical Explanation for Generalization from Adversarial Perturbations

In-Context Convergence of Transformers

How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

Unraveling the Complexities of Simplicity Bias: Mitigating and Amplifying Factors

Transformers as Support Vector Machines

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems

A Theoretical Study of Dataset Distillation

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty

In-Context Learning on Unstructured Data: Softmax Attention as a Mixture of Experts

Attention-Only Transformers and Implementing MLPs with Attention Heads

Privacy at Interpolation: Precise Analysis for Random and NTK Features

Denoising Low-Rank Data Under Distribution Shift: Double Descent and Data Augmentation

A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data

How does Gradient Descent Learn Features --- A Local Analysis for Regularized Two-Layer Neural Networks

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP

Provably Efficient CVaR RL in Low-rank MDPs

Analysis of Task Transferability in Large Pre-trained Classifiers

On Scale-Invariant Sharpness Measures

Gibbs-Based Information Criteria and the Over-Parameterized Regime

Grokking modular arithmetic can be explained by margin maximization

Chat is not available.