Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

3 Results

<<   <   Page 1 of 1   >>   >
Workshop
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen · Huaqing Zhang · Hongzhou Lin · Jingzhao Zhang
Workshop
Progressive distillation induces an implicit curriculum
Abhishek Panigrahi · Bingbin Liu · Sadhika Malladi · Andrej Risteski · Surbhi Goel
Poster
Fri 11:00 An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem
yoonsoo nam · Nayara Fonseca · Seok Hyeong Lee · Chris Mingard · Ard Louis