firstbacksecondback
3 Results
Poster
|
Wed 11:00 |
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization Ruiqi Zhang · Jingfeng Wu · Peter Bartlett |
|
Poster
|
Fri 16:30 |
SGD vs GD: Rank Deficiency in Linear Networks Aditya Vardhan Varre · Margarita Sagitova · Nicolas Flammarion |
|
Workshop
|
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models Frederik Kunstner · Robin Yadav · Alan Milligan · Mark Schmidt · Alberto Bietti |