Skip to yearly menu bar Skip to main content


Scaling Smart: Accelerating Large Language Model Pre-Training with Small Model Initialization

Mohammad Samragh ⋅ Iman Mirzadeh ⋅ Keivan Alizadeh-Vahid ⋅ Fartash Faghri ⋅ Minsik Cho ⋅ Moin Nabi ⋅ Devang Naik ⋅ Mehrdad Farajtabar
Keywords: Efficient Training

Abstract

Video

Chat is not available.