Skip to yearly menu bar Skip to main content


Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

Margaret Li

Abstract

Chat is not available.