Skip to yearly menu bar Skip to main content


Poster

DataComp-LM: In search of the next generation of training sets for language models

Jeffrey Li ⋅ Alex Fang ⋅ Georgios Smyrnis ⋅ Maor Ivgi ⋅ Matt Jordan ⋅ Samir Yitzhak Gadre ⋅ Hritik Bansal ⋅ Etash Guha ⋅ Sedrick Scott Keh ⋅ Kushal Arora ⋅ Saurabh Garg ⋅ Rui Xin ⋅ Niklas Muennighoff ⋅ Reinhard Heckel ⋅ Jean Mercat ⋅ Mayee Chen ⋅ Suchin Gururangan ⋅ Mitchell Wortsman ⋅ Alon Albalak ⋅ Yonatan Bitton ⋅ Marianna Nezhurina ⋅ Amro Abbas ⋅ Cheng-Yu Hsieh ⋅ Dhruba Ghosh ⋅ Josh Gardner ⋅ Maciej Kilian ⋅ Hanlin Zhang ⋅ Rulin Shao ⋅ Sarah Pratt ⋅ Sunny Sanyal ⋅ Gabriel Ilharco ⋅ Giannis Daras ⋅ Kalyani Marathe ⋅ Aaron Gokaslan ⋅ Jieyu Zhang ⋅ Khyathi Chandu ⋅ Thao Nguyen ⋅ Igor Vasiljevic ⋅ Sham Kakade ⋅ Shuran Song ⋅ Sujay Sanghavi ⋅ Fartash Faghri ⋅ Sewoong Oh ⋅ Luke Zettlemoyer ⋅ Kyle Lo ⋅ Alaaeldin El-Nouby ⋅ Hadi Pouransari ⋅ Alexander Toshev ⋅ Stephanie Wang ⋅ Dirk Groeneveld ⋅ Luca Soldaini ⋅ Pang Wei Koh ⋅ Jenia Jitsev ⋅ Thomas Kollar ⋅ Alex Dimakis ⋅ Yair Carmon ⋅ Achal Dave ⋅ Ludwig Schmidt ⋅ Vaishaal Shankar
2024 Poster
[ Paper

Abstract

Video

Chat is not available.