Timezone: »
Training Transformers Together
Alexander Borzunov · Max Ryabinin · Tim Dettmers · quentin lhoest · Lucile Saulnier · Michael Diskin · Yacine Jernite · Thomas Wolf
Event URL: https://training-transformers-together.github.io/ »
We invite volunteers to train a large Transformer language model over the Internet. Instead of using supercomputers, we will pool together all available computational resources: desktops, laptops, servers and even cloud TPUs from around the world. All training artifacts, such as model checkpoint and optimizer states, will be shared online for public use.
For this demonstration, we will provide an open-source starter kit that volunteers can use to join the global distributed training run and host similar experiments independently in the future.
Author Information
Alexander Borzunov (HSE University, Yandex)
Max Ryabinin (Yandex, Higher School of Economics)
Tim Dettmers (University of Washington)
quentin lhoest (Hugging Face)
Lucile Saulnier (Hugging Face)
Michael Diskin (Yandex, Higher School of Economics)
Yacine Jernite (Facebook FAIR NYC)
Thomas Wolf (🤗 Hugging Face)
More from the Same Authors
-
2023 : Interactive Panel Discussion »
Tanya Roosta · Tim Dettmers · Minjia Zhang · Nazneen Rajani -
2022 Spotlight: Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees »
Aleksandr Beznosikov · Peter Richtarik · Michael Diskin · Max Ryabinin · Alexander Gasnikov -
2022 : BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model »
Christopher Akiki · Giada Pistilli · Margot Mieskes · Matthias GallĂ© · Thomas Wolf · Suzana Ilic · Yacine Jernite -
2022 : Petals: Collaborative Inference and Fine-tuning of Large Models »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2022 : Towards Openness Beyond Open Access: User Journeys through 3 Open AI Collaboratives »
Jennifer Ding · Christopher Akiki · Yacine Jernite · Anne Steele · Temi Popo -
2022 : BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model »
Christopher Akiki · Giada Pistilli · Margot Mieskes · Matthias GallĂ© · Thomas Wolf · Suzana Ilic · Yacine Jernite -
2022 : Towards Openness Beyond Open Access: User Journeys through 3 Open AI Collaboratives »
Jennifer Ding · Christopher Akiki · Yacine Jernite · Anne Steele · Temi Popo -
2022 : Petals: Collaborative Inference and Fine-tuning of Large Models »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2022 : 8-bit Methods for Efficient Deep Learning »
Tim Dettmers -
2022 Poster: GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale »
Tim Dettmers · Mike Lewis · Younes Belkada · Luke Zettlemoyer -
2022 Poster: The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset »
Hugo Laurençon · Lucile Saulnier · Thomas Wang · Christopher Akiki · Albert Villanova del Moral · Teven Le Scao · Leandro Von Werra · Chenghao Mou · Eduardo González Ponferrada · Huu Nguyen · Jörg Frohberg · Mario Ĺ aško · Quentin Lhoest · Angelina McMillan-Major · Gerard Dupont · Stella Biderman · Anna Rogers · Loubna Ben allal · Francesco De Toni · Giada Pistilli · Olivier Nguyen · Somaieh Nikpoor · Maraim Masoud · Pierre Colombo · Javier de la Rosa · Paulo Villegas · Tristan Thrush · Shayne Longpre · Sebastian Nagel · Leon Weber · Manuel Muñoz · Jian Zhu · Daniel Van Strien · Zaid Alyafeai · Khalid Almubarak · Minh Chien Vu · Itziar Gonzalez-Dios · Aitor Soroa · Kyle Lo · Manan Dey · Pedro Ortiz Suarez · Aaron Gokaslan · Shamik Bose · David Adelani · Long Phan · Hieu Tran · Ian Yu · Suhas Pai · Jenny Chim · Violette Lepercq · Suzana Ilic · Margaret Mitchell · Sasha Alexandra Luccioni · Yacine Jernite -
2022 Poster: Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees »
Aleksandr Beznosikov · Peter Richtarik · Michael Diskin · Max Ryabinin · Alexander Gasnikov -
2021 Poster: Distributed Deep Learning In Open Collaborations »
Michael Diskin · Alexey Bukhtiyarov · Max Ryabinin · Lucile Saulnier · quentin lhoest · Anton Sinitsin · Dmitry Popov · Dmitry V. Pyrkin · Maxim Kashirin · Alexander Borzunov · Albert Villanova del Moral · Denis Mazur · Ilia Kobelev · Yacine Jernite · Thomas Wolf · Gennady Pekhimenko -
2021 Poster: Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices »
Max Ryabinin · Eduard Gorbunov · Vsevolod Plokhotnyuk · Gennady Pekhimenko -
2021 Poster: Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets »
Max Ryabinin · Andrey Malinin · Mark Gales -
2020 Poster: Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts »
Max Ryabinin · Anton Gusev -
2020 Poster: Movement Pruning: Adaptive Sparsity by Fine-Tuning »
Victor Sanh · Thomas Wolf · Alexander Rush -
2020 : An introduction to transfer learning in NLP and HuggingFace »
Thomas Wolf -
2013 Poster: Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests »
Yacine Jernite · Yoni Halpern · David Sontag