firstbacksecondback
3 Results
Workshop
|
Sun 16:30 |
Token-token correlations predict the scaling of the test loss with the number of input tokens Francesco Cagnetta · Matthieu Wyart |
|
Workshop
|
Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models Jonas Zausinger · Lars Pennig · Kacper Chlodny · Vincent Limbach · Anna Ketteler · Thorben Prein · Vishwa Mohan Singh · Michael Danziger · Jannis Born |
||
Poster
|
Thu 16:30 |
Mixture of Tokens: Continuous MoE through Cross-Example Aggregation Szymon Antoniak · Michał Krutul · Maciej Pióro · Jakub Krajewski · Jan Ludziejewski · Kamil Ciebiera · Krystian Król · Tomasz Odrzygóźdź · Marek Cygan · Sebastian Jaszczur |