Sat 6:50 a.m. - 7:00 a.m.
|
Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video
|
馃敆
|
Sat 7:00 a.m. - 7:45 a.m.
|
From algorithms to neural networks and back
(
Invited Talk
)
>
SlidesLive Video
|
Andrej Risteski
馃敆
|
Sat 7:45 a.m. - 8:30 a.m.
|
How do two-layer neural networks learn complex functions from data over time?
(
Invited Talk
)
>
SlidesLive Video
|
Florent Krzakala
馃敆
|
Sat 8:30 a.m. - 8:40 a.m.
|
Feature Learning in Infinite-Depth Neural Networks
(
Oral
)
>
link
SlidesLive Video
|
Greg Yang 路 Dingli Yu 路 Chen Zhu 路 Soufiane Hayou
馃敆
|
Sat 8:40 a.m. - 8:50 a.m.
|
Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions
(
Oral
)
>
link
SlidesLive Video
|
Yilong Qin 路 Andrej Risteski
馃敆
|
Sat 8:50 a.m. - 9:00 a.m.
|
Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models
(
Oral
)
>
link
SlidesLive Video
|
Song Mei 路 Yuchen Wu
馃敆
|
Sat 9:00 a.m. - 9:10 a.m.
|
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
(
Oral
)
>
link
SlidesLive Video
|
Zhiwei Xu 路 Yutong Wang 路 Spencer Frei 路 Gal Vardi 路 Wei Hu
馃敆
|
Sat 9:10 a.m. - 10:10 a.m.
|
Poster Session
(
Poster Session
)
>
|
馃敆
|
Sat 10:10 a.m. - 11:15 a.m.
|
Lunch Break
(
Lunch Break
)
>
|
馃敆
|
Sat 11:15 a.m. - 12:00 p.m.
|
Benefits of learning with symmetries: eigenvectors, graph representations and sample complexity
(
Invited Talk
)
>
SlidesLive Video
|
Stefanie Jegelka
馃敆
|
Sat 12:00 p.m. - 12:15 p.m.
|
Break
|
馃敆
|
Sat 12:15 p.m. - 1:00 p.m.
|
Adaptivity in Domain Adaptation and Friends
(
Invited Talk
)
>
SlidesLive Video
|
Samory Kpotufe
馃敆
|
Sat 1:00 p.m. - 1:10 p.m.
|
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
(
Oral
)
>
link
SlidesLive Video
|
Blake Bordelon 路 Lorenzo Noci 路 Mufan Li 路 Boris Hanin 路 Cengiz Pehlevan
馃敆
|
Sat 1:10 p.m. - 1:20 p.m.
|
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP
(
Oral
)
>
link
SlidesLive Video
|
Zixiang Chen 路 Yihe Deng 路 Yuanzhi Li 路 Quanquan Gu
馃敆
|
Sat 1:20 p.m. - 1:30 p.m.
|
In-Context Convergence of Transformers
(
Oral
)
>
link
SlidesLive Video
|
Yu Huang 路 Yuan Cheng 路 Yingbin Liang
馃敆
|
Sat 1:30 p.m. - 1:40 p.m.
|
Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study
(
Oral
)
>
link
SlidesLive Video
|
Prin Phunyaphibarn 路 Junghyun Lee 路 Bohan Wang 路 Huishuai Zhang 路 Chulhee Yun
馃敆
|
Sat 1:40 p.m. - 1:50 p.m.
|
Linear attention is (maybe) all you need (to understand transformer optimization)
(
Oral
)
>
link
SlidesLive Video
|
Kwangjun Ahn 路 Xiang Cheng 路 Minhak Song 路 Chulhee Yun 路 Ali Jadbabaie 路 Suvrit Sra
馃敆
|
Sat 1:50 p.m. - 2:00 p.m.
|
Closing Remarks
(
Closing Remarks
)
>
SlidesLive Video
|
馃敆
|
Sat 2:00 p.m. - 3:00 p.m.
|
Poster Session
(
Poster Session
)
>
|
馃敆
|
-
|
A PAC-Bayesian Perspective on the Interpolating Information Criterion
(
Poster
)
>
link
|
Liam Hodgkinson 路 Chris van der Heide 路 Robert Salomone 路 Fred Roosta 路 Michael Mahoney
馃敆
|
-
|
Graph Neural Networks Benefit from Structural Information Provably: A Feature Learning Perspective
(
Poster
)
>
link
|
Wei Huang 路 Yuan Cao 路 Haonan Wang 路 Xin Cao 路 Taiji Suzuki
馃敆
|
-
|
Linear attention is (maybe) all you need (to understand transformer optimization)
(
Poster
)
>
link
|
Kwangjun Ahn 路 Xiang Cheng 路 Minhak Song 路 Chulhee Yun 路 Ali Jadbabaie 路 Suvrit Sra
馃敆
|
-
|
Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study
(
Poster
)
>
link
|
Prin Phunyaphibarn 路 Junghyun Lee 路 Bohan Wang 路 Huishuai Zhang 路 Chulhee Yun
馃敆
|
-
|
Feature Learning in Infinite-Depth Neural Networks
(
Poster
)
>
link
|
Greg Yang 路 Dingli Yu 路 Chen Zhu 路 Soufiane Hayou
馃敆
|
-
|
Variational Classification
(
Poster
)
>
link
|
Shehzaad Dhuliawala 路 Mrinmaya Sachan 路 Carl Allen
馃敆
|
-
|
Implicit biases in multitask and continual learningfrom a backward error analysis perspective
(
Poster
)
>
link
|
Benoit Dherin
馃敆
|
-
|
Spectrum Extraction and Clipping for Implicitly Linear Layers
(
Poster
)
>
link
|
Ali Ebrahimpour-Boroojeny 路 Matus Telgarsky 路 Hari Sundaram
馃敆
|
-
|
The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization
(
Poster
)
>
link
|
Mingze Wang 路 Lei Wu
馃敆
|
-
|
Curvature-Dimension Tradeoff for Generalization in Hyperbolic Space
(
Poster
)
>
link
|
Nicol谩s Alvarado 路 Hans Lobel 路 Mircea Petrache
馃敆
|
-
|
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
(
Poster
)
>
link
|
GuanWen Qiu 路 Da Kuang 路 Surbhi Goel
馃敆
|
-
|
Unveiling the Hessian's Connection to the Decision Boundary
(
Poster
)
>
link
|
Mahalakshmi Sabanayagam 路 Freya Behrens 路 Urte Adomaityte 路 Anna Dawid
馃敆
|
-
|
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks
(
Poster
)
>
link
|
Zixuan Zhang 路 Kaiqi Zhang 路 Minshuo Chen 路 Yuma Takeda 路 Mengdi Wang 路 Tuo Zhao 路 Yu-Xiang Wang
馃敆
|
-
|
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
(
Poster
)
>
link
|
Ekaterina Lobacheva 路 Eduard Pokonechny 路 Maxim Kodryan 路 Dmitry Vetrov
馃敆
|
-
|
Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization
(
Poster
)
>
link
|
Atli Kosson 路 Dongyang Fan 路 Martin Jaggi
馃敆
|
-
|
Generalization Guarantees of Deep ResNets in the Mean-Field Regime
(
Poster
)
>
link
|
Yihang Chen 路 Fanghui Liu 路 Yiping Lu 路 Grigorios Chrysos 路 Volkan Cevher
馃敆
|
-
|
Theoretical Explanation for Generalization from Adversarial Perturbations
(
Poster
)
>
link
|
Soichiro Kumano 路 Hiroshi Kera 路 Toshihiko Yamasaki
馃敆
|
-
|
In-Context Convergence of Transformers
(
Poster
)
>
link
|
Yu Huang 路 Yuan Cheng 路 Yingbin Liang
馃敆
|
-
|
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
(
Poster
)
>
link
|
Yatin Dandi 路 Florent Krzakala 路 Bruno Loureiro 路 Luca Pesce 路 Ludovic Stephan
馃敆
|
-
|
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
(
Poster
)
>
link
|
Ziqiao Wang 路 Yongyi Mao
馃敆
|
-
|
Unraveling the Complexities of Simplicity Bias: Mitigating and Amplifying Factors
(
Poster
)
>
link
|
Xuchen Gong 路 Tianwen Fu
馃敆
|
-
|
Transformers as Support Vector Machines
(
Poster
)
>
link
|
Davoud Ataee Tarzanagh 路 Yingcong Li 路 Christos Thrampoulidis 路 Samet Oymak
馃敆
|
-
|
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems
(
Poster
)
>
link
|
Juno Kim 路 Kakei Yamamoto 路 Kazusato Oko 路 Zhuoran Yang 路 Taiji Suzuki
馃敆
|
-
|
A Theoretical Study of Dataset Distillation
(
Poster
)
>
link
|
Zachary Izzo 路 James Zou
馃敆
|
-
|
Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models
(
Poster
)
>
link
|
Deqing Fu 路 Tian-qi Chen 路 Robin Jia 路 Vatsal Sharan
馃敆
|
-
|
Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty
(
Poster
)
>
link
|
Kajetan Schweighofer 路 Lukas Aichberger 路 Mykyta Ielanskyi 路 Sepp Hochreiter
馃敆
|
-
|
In-Context Learning on Unstructured Data: Softmax Attention as a Mixture of Experts
(
Poster
)
>
link
|
Kevin Christian Wibisono 路 Yixin Wang
馃敆
|
-
|
Attention-Only Transformers and Implementing MLPs with Attention Heads
(
Poster
)
>
link
|
Robert Huben 路 Valerie Morris
馃敆
|
-
|
Privacy at Interpolation: Precise Analysis for Random and NTK Features
(
Poster
)
>
link
|
Simone Bombari 路 Marco Mondelli
馃敆
|
-
|
Denoising Low-Rank Data Under Distribution Shift: Double Descent and Data Augmentation
(
Poster
)
>
link
|
Chinmaya Kausik 路 Kashvi Srivastava 路 Rishi Sonthalia
馃敆
|
-
|
A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
(
Poster
)
>
link
|
Behrad Moniri 路 Donghwan Lee 路 Hamed Hassani 路 Edgar Dobriban
馃敆
|
-
|
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
(
Poster
)
>
link
|
Zhiwei Xu 路 Yutong Wang 路 Spencer Frei 路 Gal Vardi 路 Wei Hu
馃敆
|
-
|
How does Gradient Descent Learn Features --- A Local Analysis for Regularized Two-Layer Neural Networks
(
Poster
)
>
link
|
Mo Zhou 路 Rong Ge
馃敆
|
-
|
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP
(
Poster
)
>
link
|
Zixiang Chen 路 Yihe Deng 路 Yuanzhi Li 路 Quanquan Gu
馃敆
|
-
|
Provably Efficient CVaR RL in Low-rank MDPs
(
Poster
)
>
link
|
Yulai Zhao 路 Wenhao Zhan 路 Xiaoyan Hu 路 Ho-fung Leung 路 Farzan Farnia 路 Wen Sun 路 Jason Lee
馃敆
|
-
|
Analysis of Task Transferability in Large Pre-trained Classifiers
(
Poster
)
>
link
|
Akshay Mehra 路 Yunbei Zhang 路 Jihun Hamm
馃敆
|
-
|
On Scale-Invariant Sharpness Measures
(
Poster
)
>
link
|
Behrooz Tahmasebi 路 Ashkan Soleymani 路 Stefanie Jegelka 路 Patrick Jaillet
馃敆
|
-
|
Gibbs-Based Information Criteria and the Over-Parameterized Regime
(
Poster
)
>
link
|
Haobo Chen 路 Yuheng Bu 路 Gregory Wornell
馃敆
|
-
|
Grokking modular arithmetic can be explained by margin maximization
(
Poster
)
>
link
|
Mohamad Amin Mohamadi 路 Zhiyuan Li 路 Lei Wu 路 Danica J. Sutherland
馃敆
|
-
|
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \ Global Convergence Guarantees and Feature Learning
(
Poster
)
>
link
|
Fadhel Ayed 路 Francois Caron 路 Paul Jung 路 Juho Lee 路 Hoil Lee 路 Hongseok Yang
馃敆
|
-
|
On the Computational Complexity of Inverting Generative Models
(
Poster
)
>
link
|
Feyza Duman Keles 路 Chinmay Hegde
馃敆
|
-
|
Flow-Based High-Dimensionally Distributional Robust Optimization
(
Poster
)
>
link
|
Chen Xu 路 Jonghyeok Lee 路 Xiuyuan Cheng 路 Yao Xie
馃敆
|
-
|
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
(
Poster
)
>
link
|
Licong Lin 路 Yu Bai 路 Song Mei
馃敆
|
-
|
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
(
Poster
)
>
link
|
Tianyu Guo 路 Wei Hu 路 Song Mei 路 Huan Wang 路 Caiming Xiong 路 Silvio Savarese 路 Yu Bai
馃敆
|
-
|
A Theoretical Explanation of Deep RL Performance in Stochastic Environments
(
Poster
)
>
link
|
Cassidy Laidlaw 路 Banghua Zhu 路 Stuart J Russell 路 Anca Dragan
馃敆
|
-
|
Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models
(
Poster
)
>
link
|
Song Mei 路 Yuchen Wu
馃敆
|
-
|
Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line
(
Poster
)
>
link
|
Rishi Sonthalia 路 Xinyue (Serena) Li 路 Bochao Gu
馃敆
|
-
|
Continual Learning for Long-Tailed Recognition: Bridging the Gap in Theory and Practice
(
Poster
)
>
link
|
Mahdiyar Molahasani 路 Ali Etemad 路 Michael Greenspan
馃敆
|
-
|
SimVAE: Narrowing the gap between Discriminative & Generative Representation Learning
(
Poster
)
>
link
|
Alice Bizeul 路 Carl Allen
馃敆
|
-
|
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
(
Poster
)
>
link
|
Atli Kosson 路 Bettina Messmer 路 Martin Jaggi
馃敆
|
-
|
Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate
(
Poster
)
>
link
|
Miao Lu 路 Beining Wu 路 Xiaodong Yang 路 Difan Zou
馃敆
|
-
|
On Compositionality and Emergence in Physical Systems Generativie Modeling
(
Poster
)
>
link
|
Justin Diamond
馃敆
|
-
|
Escaping Random Teacher Initialization Enhances Signal Propagation and Representations
(
Poster
)
>
link
|
Felix Sarnthein 路 Sidak Pal Singh 路 Antonio Orvieto 路 Thomas Hofmann
馃敆
|
-
|
The Expressive Power of Transformers with Chain of Thought
(
Poster
)
>
link
|
William Merrill 路 Ashish Sabharwal
馃敆
|
-
|
Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning
(
Poster
)
>
link
|
Hongkang Li 路 Meng Wang 路 Songtao Lu 路 Hui Wan 路 Xiaodong Cui 路 Pin-Yu Chen
馃敆
|
-
|
Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions
(
Poster
)
>
link
|
Yilong Qin 路 Andrej Risteski
馃敆
|
-
|
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
(
Poster
)
>
link
|
Qingyue Zhao 路 Banghua Zhu
馃敆
|
-
|
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
(
Poster
)
>
link
|
Elan Rosenfeld 路 Andrej Risteski
馃敆
|
-
|
MoXCo:How I learned to stop exploring and love my local minima?
(
Poster
)
>
link
|
Esha Singh 路 Shoham Sabach 路 Yu-Xiang Wang
馃敆
|
-
|
First-order ANIL provably learns representations despite overparametrisation
(
Poster
)
>
link
|
Oguz Kaan Yuksel 路 Etienne Boursier 路 Nicolas Flammarion
馃敆
|
-
|
A Data-Driven Measure of Relative Uncertainty for Misclassification Detection
(
Poster
)
>
link
|
Eduardo Dadalto C芒mara Gomes 路 Marco Romanelli 路 Georg Pichler 路 Pablo Piantanida
馃敆
|
-
|
Non-Vacuous Generalization Bounds for Large Language Models
(
Poster
)
>
link
|
Sanae Lotfi 路 Marc Finzi 路 Yilun Kuang 路 Tim G. J. Rudner 路 Micah Goldblum 路 Andrew Wilson
馃敆
|
-
|
Learning from setbacks: the impact of adversarial initialization on generalization performance
(
Poster
)
>
link
|
Yatin Dandi 路 Stefani Karp 路 Francesca Mignacco 路 Kavya Ravichandran
馃敆
|
-
|
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
(
Poster
)
>
link
|
Blake Bordelon 路 Lorenzo Noci 路 Mufan Li 路 Boris Hanin 路 Cengiz Pehlevan
馃敆
|
-
|
Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo
(
Poster
)
>
link
|
Szilvia Ujv谩ry 路 Gergely Flamich 路 Vincent Fortuin 路 Jos茅 Miguel Hern谩ndez-Lobato
馃敆
|
-
|
Divergence at the Interpolation Threshold: Identifying, Interpreting \& Ablating the Sources of a Deep Learning Puzzle
(
Poster
)
>
link
|
Rylan Schaeffer 路 Zachary Robertson 路 Akhilan Boopathy 路 Mikail Khona 路 Ila Fiete 路 Andrey Gromov 路 Sanmi Koyejo
馃敆
|
-
|
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
(
Poster
)
>
link
|
Yuqing Wang 路 Zhenghao Xu 路 Tuo Zhao 路 Molei Tao
馃敆
|
-
|
Toward Student-oriented Teacher Network Training for Knowledge Distillation
(
Poster
)
>
link
|
Chengyu Dong 路 Liyuan Liu 路 Jingbo Shang
馃敆
|
-
|
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks
(
Poster
)
>
link
|
Anna Bair 路 Hongxu Yin 路 Maying Shen 路 Pavlo Molchanov 路 Jose M. Alvarez
馃敆
|
-
|
Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Matrix Factorizations
(
Poster
)
>
link
|
Can Yaras 路 Peng Wang 路 Wei Hu 路 Zhihui Zhu 路 Laura Balzano 路 Qing Qu
馃敆
|
-
|
How Structured Data Guides Feature Learning: A Case Study of the Parity Problem
(
Poster
)
>
link
|
Atsushi Nitanda 路 Kazusato Oko 路 Taiji Suzuki 路 Denny Wu
馃敆
|
-
|
The Next Symbol Prediction Problem: PAC-learning and its relation to Language Models
(
Poster
)
>
link
|
Satwik Bhattamishra 路 Phil Blunsom 路 Varun Kanade
馃敆
|
-
|
Why Do We Need Weight Decay for Overparameterized Deep Networks?
(
Poster
)
>
link
|
Maksym Andriushchenko 路 Francesco D'Angelo 路 Aditya Vardhan Varre 路 Nicolas Flammarion
馃敆
|
-
|
The Double-Edged Sword: Perception and Uncertainty in Inverse Problems
(
Poster
)
>
link
|
Regev Cohen 路 Ehud Rivlin 路 Daniel Freedman
馃敆
|
-
|
Near-Interpolators: Fast Norm Growth and Tempered Near-Overfitting
(
Poster
)
>
link
|
Yutong Wang 路 Rishi Sonthalia 路 Wei Hu
馃敆
|
-
|
On robust overfitting: adversarial training induced distribution matters
(
Poster
)
>
link
|
Runzhi Tian 路 Yongyi Mao
馃敆
|
-
|
Are Graph Neural Networks Optimal Approximation Algorithms?
(
Poster
)
>
link
|
Morris Yau 路 Eric Lu 路 Nikolaos Karalias 路 Jessica Xu 路 Stefanie Jegelka
馃敆
|
-
|
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
(
Poster
)
>
link
|
Yuandong Tian 路 Yiping Wang 路 Zhenyu Zhang 路 Beidi Chen 路 Simon Du
馃敆
|