Skip to yearly menu bar Skip to main content



Abstract:

Language models (LMs) have become ubiquitous in both AI research and commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed.

In this talk, I present our OLMo project aimed at building strong language models and making them fully accessible to researchers along with open-source code for data, training, and inference. Training language models are expensive, therefore we optimize for quality vs. compute cost. I focus on how data, architecture, and training improvements advance models at pre-training and post-training stages with less compute cost.

Chat is not available.