Skip to yearly menu bar Skip to main content


Spotlight Poster

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Guilherme Penedo ⋅ Hynek Kydlíček ⋅ Loubna Ben allal ⋅ Anton Lozhkov ⋅ Margaret Mitchell ⋅ Colin Raffel ⋅ Leandro Von Werra ⋅ Thomas Wolf
2024 Spotlight Poster
[ Paper

Abstract

Video

Chat is not available.