firstbacksecondback
3 Results
Workshop
|
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency Kaiyue Wen · Huaqing Zhang · Hongzhou Lin · Jingzhao Zhang |
||
Workshop
|
Progressive distillation induces an implicit curriculum Abhishek Panigrahi · Bingbin Liu · Sadhika Malladi · Andrej Risteski · Surbhi Goel |
||
Poster
|
Fri 11:00 |
An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem yoonsoo nam · Nayara Fonseca · Seok Hyeong Lee · Chris Mingard · Ard Louis |