Sat 8:50 a.m. - 9:00 a.m.
|
Opening Remarks
SlidesLive Video
|
馃敆
|
Sat 9:00 a.m. - 9:45 a.m.
|
Flat Minima and Generalization: from Matrix Sensing to Neural Networks
(
Invited Talk
)
>
SlidesLive Video
|
Maryam Fazel
馃敆
|
Sat 9:45 a.m. - 10:30 a.m.
|
A Theoretical Perspective on Hardness of Sampling and Learning from Samples in High Dimensions
(
Invited Talk
)
>
SlidesLive Video
|
Lenka Zdeborov谩
馃敆
|
Sat 10:30 a.m. - 10:45 a.m.
|
Classifier-Free Guidance is a Predictor-Corrector
(
Oral
)
>
link
SlidesLive Video
|
Arwen Bradley 路 Preetum Nakkiran
馃敆
|
Sat 10:45 a.m. - 11:00 a.m.
|
Towards characterizing the value of edge embeddings in Graph Neural Networks
(
Oral
)
>
link
SlidesLive Video
|
Dhruv Rohatgi 路 Tanya Marwah 路 Zachary Lipton 路 Jianfeng Lu 路 Ankur Moitra 路 Andrej Risteski
馃敆
|
Sat 11:00 a.m. - 11:15 a.m.
|
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
(
Oral
)
>
link
SlidesLive Video
|
Siyu Chen 路 Beining Wu 路 Miao Lu 路 Zhuoran Yang 路 Tianhao Wang
馃敆
|
Sat 11:15 a.m. - 12:15 p.m.
|
Poster Session 1
(
Poster Session
)
>
|
馃敆
|
Sat 12:15 p.m. - 1:30 p.m.
|
Lunch Break
|
馃敆
|
Sat 1:30 p.m. - 2:15 p.m.
|
Scaling Deep Learning Optimization: Insights into Efficiency, Preconditioning, and Critical Batch Sizes
(
Invited Talk
)
>
SlidesLive Video
|
Sham Kakade
馃敆
|
Sat 2:15 p.m. - 3:00 p.m.
|
Open problems in LLM Theory, DL theory, and the role of theory
(
Invited Talk
)
>
SlidesLive Video
|
Matus Telgarsky
馃敆
|
Sat 3:00 p.m. - 3:15 p.m.
|
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
(
Oral
)
>
link
SlidesLive Video
|
Riccardo Grazzi 路 Julien Siems 路 J枚rg Franke 路 Arber Zela 路 Frank Hutter 路 Massimiliano Pontil
馃敆
|
Sat 3:15 p.m. - 3:30 p.m.
|
Understanding Factual Recall in Transformers via Associative Memories
(
Oral
)
>
link
SlidesLive Video
|
Eshaan Nichani 路 Jason Lee 路 Alberto Bietti
馃敆
|
Sat 3:30 p.m. - 3:45 p.m.
|
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
(
Oral
)
>
link
SlidesLive Video
|
Tianyu Guo 路 Druv Pai 路 Yu Bai 路 Jiantao Jiao 路 Michael Jordan 路 Song Mei
馃敆
|
Sat 3:45 p.m. - 4:00 p.m.
|
Mixture of Parrots: Mixtures of experts improve memorization more than reasoning
(
Oral
)
>
link
SlidesLive Video
|
Samy Jelassi 路 Clara Mohri 路 David Brandfonbrener 路 Alex Gu 路 Nikhil Vyas 路 Nikhil Anand 路 David Alvarez-Melis 路 Yuanzhi Li 路 Sham Kakade 路 Eran Malach
馃敆
|
Sat 4:00 p.m. - 5:00 p.m.
|
Poster Session 2
(
Poster Session
)
>
|
馃敆
|
-
|
Does Machine Bring in Extra Bias in Learning? Approximating Discrimination Within Models Quickly
(
Poster
)
>
link
|
Yijun Bian 路 Yujie Luo 路 Ping Xu
馃敆
|
-
|
On the Implicit Relation between Low-Rank Adaptation and Differential Privacy
(
Poster
)
>
link
|
Saber Malekmohammadi 路 Golnoosh Farnadi
馃敆
|
-
|
Self-Improvement in Language Models: The Sharpening Mechanism
(
Poster
)
>
link
|
Audrey Huang 路 Adam Block 路 Dylan J Foster 路 Dhruv Rohatgi 路 Cyril Zhang 路 Max Simchowitz 路 Jordan Ash 路 Akshay Krishnamurthy
馃敆
|
-
|
SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network
(
Poster
)
>
link
|
Tomer Galanti 路 Zachary Siegel 路 Aparna Gupte 路 Tomaso Poggio
馃敆
|
-
|
Information-Theoretic Generalization Bounds for Batch Reinforcement Learning
(
Poster
)
>
link
|
Xingtu Liu
馃敆
|
-
|
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
(
Poster
)
>
link
|
Neil Mallinar 路 Daniel Beaglehole 路 Libin Zhu 路 Adityanarayanan Radhakrishnan 路 Parthe Pandit 路 Misha Belkin
馃敆
|
-
|
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
(
Poster
)
>
link
|
Binghui Li 路 Yuanzhi Li
馃敆
|
-
|
Depth Extrapolation of Decoders Trained on Nested Structures
(
Poster
)
>
link
|
Emile Richard
馃敆
|
-
|
Diffusion Model Learns Low-Dimensional Distributions via Subspace Clustering
(
Poster
)
>
link
|
Peng Wang 路 Huijie Zhang 路 Zekai Zhang 路 Siyi Chen 路 Yi Ma 路 Qing Qu
馃敆
|
-
|
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
(
Poster
)
>
link
|
Dayal Singh Kalra 路 Tianyu He 路 Maissam Barkeshli
馃敆
|
-
|
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
(
Poster
)
>
link
|
Juno Kim 路 Dimitri Meunier 路 Arthur Gretton 路 Taiji Suzuki 路 Zhu Li
馃敆
|
-
|
How do students become teachers: A dynamical analysis for two-layer neural networks
(
Poster
)
>
link
|
Zhenyu Zhu 路 Fanghui Liu 路 Volkan Cevher
馃敆
|
-
|
Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection
(
Poster
)
>
link
|
Aaron Alvarado Kristanto Julistiono 路 Davoud Ataee Tarzanagh 路 Navid Azizan
馃敆
|
-
|
Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks
(
Poster
)
>
link
|
Ouns El Harzli 路 Bernardo Grau
馃敆
|
-
|
Convergence of Distributed Adaptive Optimization with Local Updates
(
Poster
)
>
link
|
Ziheng Cheng 路 Margalit Glasgow
馃敆
|
-
|
Progressive distillation induces an implicit curriculum
(
Poster
)
>
link
|
Abhishek Panigrahi 路 Bingbin Liu 路 Sadhika Malladi 路 Andrej Risteski 路 Surbhi Goel
馃敆
|
-
|
Comparing Implicit and Denoising Score-Matching Objectives
(
Poster
)
>
link
|
Artem Artemev 路 Ayan Das 路 Farhang Nabiei 路 Alberto Bernacchia
馃敆
|
-
|
Understanding Diffusion-based Representation Learning via Low-Dimensional Modeling
(
Poster
)
>
link
|
Xiao Li 路 Zekai Zhang 路 Xiang Li 路 Siyi Chen 路 Zhihui Zhu 路 Peng Wang 路 Qing Qu
馃敆
|
-
|
Benign Overfitting in Single-Head Attention
(
Poster
)
>
link
|
Roey Magen 路 Shuning Shang 路 Zhiwei Xu 路 Spencer Frei 路 Wei Hu 路 Gal Vardi
馃敆
|
-
|
The GAN is dead; long live the GAN! A Modern GAN Baseline
(
Poster
)
>
link
|
Nick Huang 路 Aaron Gokaslan 路 Volodymyr Kuleshov 路 James Tompkin
馃敆
|
-
|
Information-Theoretic Foundations for Neural Scaling Laws
(
Poster
)
>
link
|
Hong Jun Jeon 路 Benjamin Van Roy
馃敆
|
-
|
Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training
(
Poster
)
>
link
|
Anchit Jain 路 Rozhin Nobahari 路 Aristide Baratin 路 Stefano Sarao Mannelli
馃敆
|
-
|
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
(
Poster
)
>
link
|
Will Merrill 路 Ashish Sabharwal
馃敆
|
-
|
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
(
Poster
)
>
link
|
Frederik Kunstner 路 Robin Yadav 路 Alan Milligan 路 Mark Schmidt 路 Alberto Bietti
馃敆
|
-
|
Provable weak-to-strong generalization via benign overfitting
(
Poster
)
>
link
|
David Wu 路 Anant Sahai
馃敆
|
-
|
On Your Mark, Get Set, Warmup!
(
Poster
)
>
link
|
Dayal Singh Kalra 路 Maissam Barkeshli
馃敆
|
-
|
Continuous-Time Analysis of Adaptive Optimization and Normalization
(
Poster
)
>
link
|
Rhys Gould 路 Hidenori Tanaka
馃敆
|
-
|
Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning
(
Poster
)
>
link
|
Alexey Rukhovich 路 Alexander Podolskiy 路 Irina Piontkovskaya
馃敆
|
-
|
Transformers are Efficient Compilers, Provably
(
Poster
)
>
link
|
Xiyu Zhai 路 Runlong Zhou 路 Liao Zhang 路 Simon Du
馃敆
|
-
|
Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study
(
Poster
)
>
link
|
Pengfei He 路 Yingqian Cui 路 Han Xu 路 Hui Liu 路 Makoto Yamada 路 Jiliang Tang 路 Yue XING
馃敆
|
-
|
Towards characterizing the value of edge embeddings in Graph Neural Networks
(
Poster
)
>
link
|
Dhruv Rohatgi 路 Tanya Marwah 路 Zachary Lipton 路 Jianfeng Lu 路 Ankur Moitra 路 Andrej Risteski
馃敆
|
-
|
Optimizing Fine-Tuning Efficiency: Gradient Subspace Tracking on Grassmann Manifolds for Large Language Models
(
Poster
)
>
link
|
Sahar Rajabi 路 Sirisha Rambhatla
馃敆
|
-
|
Benign Overfitting in Out-of-Distribution Generalization of Linear Models
(
Poster
)
>
link
|
Shange Tang 路 Jiayun Wu 路 Jianqing Fan 路 Chi Jin
馃敆
|
-
|
Dynamics of Concept Learning and Compositional Generalization
(
Poster
)
>
link
|
Yongyi Yang 路 Core Francisco Park 路 Ekdeep S Lubana 路 Maya Okawa 路 Wei Hu 路 Hidenori Tanaka
馃敆
|
-
|
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
(
Poster
)
>
link
|
Jonas H眉botter 路 Sascha Bongni 路 Ido Hakimi 路 Andreas Krause
馃敆
|
-
|
Declarative characterizations of direct preference alignment algorithms
(
Poster
)
>
link
|
Kyle Richardson 路 Vivek Srikumar 路 Ashish Sabharwal
馃敆
|
-
|
Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets
(
Poster
)
>
link
|
Yuandong Tian
馃敆
|
-
|
Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks
(
Poster
)
>
link
|
Ouns El Harzli 路 Bernardo Grau
馃敆
|
-
|
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
(
Poster
)
>
link
|
Cl茅mentine Domin茅 路 Nicolas Anguita 路 Alexandra Proca 路 Lukas Braun 路 Daniel Kunin 路 Pedro A.M Mediano 路 Andrew Saxe
馃敆
|
-
|
Geometric Deep Learning with Quasiconformal Neural Networks: An Introduction
(
Poster
)
>
link
|
Nico Alvarado 路 Hans Lobel
馃敆
|
-
|
Sample compression unleashed : New generalization bounds for real valued losses
(
Poster
)
>
link
|
Mathieu Bazinet 路 Valentina Zantedeschi 路 Pascal Germain
馃敆
|
-
|
Increasing Fairness via Combination with Learning Guarantees
(
Poster
)
>
link
|
Yijun Bian 路 Kun Zhang
馃敆
|
-
|
Simple and Effective Masked Diffusion Language Models
(
Poster
)
>
link
|
Subham Sahoo 路 Marianne Arriola 路 Aaron Gokaslan 路 Yair Schiff 路 Edgar Marroquin 路 Justin Chiu 路 Alexander Rush 路 Volodymyr Kuleshov
馃敆
|
-
|
Convergence Properties of Hyperbolic Neural Networks on Riemannian Manifolds
(
Poster
)
>
link
|
Nico Alvarado 路 Sebastian Burgos
馃敆
|
-
|
Understanding Factual Recall in Transformers via Associative Memories
(
Poster
)
>
link
|
Eshaan Nichani 路 Jason Lee 路 Alberto Bietti
馃敆
|
-
|
Leveraging Intermediate Neural Collapse: Fixing Layers Beyond Effective Depth to Simplex ETFs for Efficient Deep Neural Networks
(
Poster
)
>
link
|
Emily Liu
馃敆
|
-
|
A Theory of Initialisation's Impact on Specialisation
(
Poster
)
>
link
|
Devon Jarvis 路 Sebastian Lee 路 Cl茅mentine Domin茅 路 Andrew Saxe 路 Stefano Sarao Mannelli
馃敆
|
-
|
An empirical study of the (L0,L1)-smoothness condition
(
Poster
)
>
link
|
Y Cooper
馃敆
|
-
|
Diffusion Models With Learned Adaptive Noise Processes
(
Poster
)
>
link
|
Subham Sahoo 路 Aaron Gokaslan 路 Christopher De Sa 路 Volodymyr Kuleshov
馃敆
|
-
|
Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift
(
Poster
)
>
link
|
Mitsuhiro Fujikawa 路 Youhei Akimoto 路 Jun Sakuma 路 Kazuto Fukuchi
馃敆
|
-
|
A Theoretical Framework for Federated Domain Generalization with Gradient Alignment
(
Poster
)
>
link
|
Mahdiyar Molahasani 路 Milad Soltany 路 Farhad Pourpanah 路 Michael Greenspan 路 Ali Etemad
馃敆
|
-
|
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
(
Poster
)
>
link
|
Tianyu Guo 路 Druv Pai 路 Yu Bai 路 Jiantao Jiao 路 Michael Jordan 路 Song Mei
馃敆
|
-
|
In-Context Learning by Linear Attention: Exact Asymptotics and Experiments
(
Poster
)
>
link
|
Yue Lu 路 Mary Letey 路 Jacob Zavatone-Veth 路 Anindita Maiti 路 Cengiz Pehlevan
馃敆
|
-
|
The Crucial Role of Samplers in Online Direct Preference Optimization
(
Poster
)
>
link
|
Ruizhe Shi 路 Runlong Zhou 路 Simon Du
馃敆
|
-
|
Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization
(
Poster
)
>
link
|
Matan Schliserman 路 Tomer Koren
馃敆
|
-
|
Misspecified Q -Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
(
Poster
)
>
link
|
Ally Du 路 Lin Yang 路 Ruosong Wang
馃敆
|
-
|
Exploring Task Affinities through NTK Alignment and Early Training Dynamics in Multi-Task Learning
(
Poster
)
>
link
|
Yoann Morello 路 Emilie Gr茅goire 路 Sam Verboven
馃敆
|
-
|
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
(
Poster
)
>
link
|
Yuda Song 路 Hanlin Zhang 路 Udaya Ghai 路 Carson Eisenach 路 Sham Kakade 路 Dean Foster
馃敆
|
-
|
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
(
Poster
)
>
link
|
Riccardo Grazzi 路 Julien Siems 路 J枚rg Franke 路 Arber Zela 路 Frank Hutter 路 Massimiliano Pontil
馃敆
|
-
|
Transformers Provably Solve Parity Efficiently with Chain of Thought
(
Poster
)
>
link
|
Juno Kim 路 Taiji Suzuki
馃敆
|
-
|
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
(
Poster
)
>
link
|
Yibo Jiang 路 Goutham Rajendran 路 Pradeep Ravikumar 路 Bryon Aragam
馃敆
|
-
|
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
(
Poster
)
>
link
|
Kairong Luo 路 Haodong Wen 路 Shengding Hu 路 Zhenbo Sun 路 Zhiyuan Liu 路 Maosong Sun 路 Kaifeng Lyu 路 Wenguang Chen
馃敆
|
-
|
Algorithmic Stability of Minimum-Norm Interpolating Deep Neural Networks
(
Poster
)
>
link
|
Ouns El Harzli 路 yoonsoo nam 路 Ilja Kuzborskij 路 Bernardo Grau 路 Ard Louis
馃敆
|
-
|
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
(
Poster
)
>
link
|
Noam Razin 路 Sadhika Malladi 路 Adithya Bhaskar 路 Danqi Chen 路 Sanjeev Arora 路 Boris Hanin
馃敆
|
-
|
Can Bayesian Neural Networks Make Confident Predictions?
(
Poster
)
>
link
|
Katharine Fisher
馃敆
|
-
|
Provable unlearning in topic modeling and downstream tasks
(
Poster
)
>
link
|
Stanley Wei 路 Sadhika Malladi 路 Sanjeev Arora 路 Amartya Sanyal
馃敆
|
-
|
Implicit Bias of Adam versus Gradient Descent in One-Hidden-Layer Neural Networks
(
Poster
)
>
link
|
Bhavya Vasudeva 路 Vatsal Sharan 路 Mahdi Soltanolkotabi
馃敆
|
-
|
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
(
Poster
)
>
link
|
Kaiyue Wen 路 Huaqing Zhang 路 Hongzhou Lin 路 Jingzhao Zhang
馃敆
|
-
|
HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks
(
Poster
)
>
link
|
Yongyi Yang 路 Jiaming Yang 路 Wei Hu 路 Michal Derezinski
馃敆
|
-
|
Parameter Symmetry and Emergence of Noise Equilibrium in Stochastic Training
(
Poster
)
>
link
|
Liu Ziyin 路 Mingze Wang 路 Hongchao Li 路 Lei Wu
馃敆
|
-
|
Improving the Gaussian Approximation in Neural Networks: Para-Gaussians and Edgeworth Expansions
(
Poster
)
>
link
|
Mihai Nica 路 Janosch Ortmann
馃敆
|
-
|
Mixture of Parrots: Mixtures of experts improve memorization more than reasoning
(
Poster
)
>
link
|
Samy Jelassi 路 Clara Mohri 路 David Brandfonbrener 路 Alex Gu 路 Nikhil Vyas 路 Nikhil Anand 路 David Alvarez-Melis 路 Yuanzhi Li 路 Sham Kakade 路 Eran Malach
馃敆
|
-
|
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
(
Poster
)
>
link
|
Siyu Chen 路 Beining Wu 路 Miao Lu 路 Zhuoran Yang 路 Tianhao Wang
馃敆
|
-
|
Label Noise: Ignorance Is Bliss
(
Poster
)
>
link
|
Yilun Zhu 路 Jianxin Zhang 路 Aditya Gangrade 路 Clay Scott
馃敆
|
-
|
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
(
Poster
)
>
link
|
Francesco Mori 路 Stefano Sarao Mannelli 路 Francesca Mignacco
馃敆
|
-
|
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
(
Poster
)
>
link
|
Yinuo Ren 路 Haoxuan Chen 路 Grant Rotskoff 路 Lexing Ying
馃敆
|
-
|
Accumulating Data Avoids Model Collapse
(
Poster
)
>
link
|
Joshua Kazdan 路 Apratim Dey 路 Rylan Schaeffer 路 Matthias Gerstgrasser 路 Rafael Rafailov 路 David Donoho 路 Sanmi Koyejo
馃敆
|
-
|
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
(
Poster
)
>
link
|
Nikolaos Tsilivis 路 Gal Vardi 路 Julia Kempe
馃敆
|
-
|
Robust Feature Learning for Multi-Index Models in High Dimensions
(
Poster
)
>
link
|
Alireza Mousavi-Hosseini 路 Adel Javanmard 路 Murat Erdogdu
馃敆
|
-
|
Classifier-Free Guidance is a Predictor-Corrector
(
Poster
)
>
link
|
Arwen Bradley 路 Preetum Nakkiran
馃敆
|
-
|
Towards Principled Graph Transformers
(
Poster
)
>
link
|
Luis M眉ller 路 Daniel Kusuma 路 Blai Bonet 路 Christopher Morris
馃敆
|