Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

3 Results

<<   <   Page 1 of 1   >>   >
Poster
Wed 11:00 In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
Ruiqi Zhang · Jingfeng Wu · Peter Bartlett
Poster
Fri 16:30 SGD vs GD: Rank Deficiency in Linear Networks
Aditya Vardhan Varre · Margarita Sagitova · Nicolas Flammarion
Workshop
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner · Robin Yadav · Alan Milligan · Mark Schmidt · Alberto Bietti