Skip to yearly menu bar Skip to main content

Poster Session
Workshop: Mathematics of Modern Machine Learning (M3L)

Poster Session


Posters in this session:

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

On the Computational Complexity of Inverting Generative Models

Flow-Based High-Dimensionally Distributional Robust Optimization

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

A Theoretical Explanation of Deep RL Performance in Stochastic Environments

Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models

Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line

Continual Learning for Long-Tailed Recognition: Bridging the Gap in Theory and Practice

SimVAE: Narrowing the gap between Discriminative & Generative Representation Learning

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate

On Compositionality and Emergence in Physical Systems Generativie Modeling

Escaping Random Teacher Initialization Enhances Signal Propagation and Representations

The Expressive Power of Transformers with Chain of Thought

Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning

Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions

Towards the Fundamental Limits of Knowledge Transfer over Finite Domains

Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

MoXCo:How I learned to stop exploring and love my local minima?

First-order ANIL provably learns representations despite overparametrisation

A Data-Driven Measure of Relative Uncertainty for Misclassification Detection

Non-Vacuous Generalization Bounds for Large Language Models

Learning from setbacks: the impact of adversarial initialization on generalization performance

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo

Divergence at the Interpolation Threshold: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

Toward Student-oriented Teacher Network Training for Knowledge Distillation

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Matrix Factorizations

How Structured Data Guides Feature Learning: A Case Study of the Parity Problem

The Next Symbol Prediction Problem: PAC-learning and its relation to Language Mode

Why Do We Need Weight Decay for Overparameterized Deep Networks?

The Double-Edged Sword: Perception and Uncertainty in Inverse Problems ls

Near-Interpolators: Fast Norm Growth and Tempered Near-Overfitting

On robust overfitting: adversarial training induced distribution matters

Are Graph Neural Networks Optimal Approximation Algorithms?

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Chat is not available.