Skip to yearly menu bar Skip to main content


Zoom presentation
in
Competition: NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day

Invited Speaker: Mojan Javaheripi (Microsoft Research) - Unleashing the power of Small Language Models

Mojan Javaheripi


Abstract:

Over the past few months, we have released a suite of small language models (SLMs) called “Phi” that achieve unprecedented performance on a variety of benchmarks. Our first model, the 1.3 billion parameter Phi-1, achieved state-of-the-art performance on Python coding among SLMs. We then extended our focus to common sense reasoning and language understanding, and created a new 1.3 billion parameter model named Phi-1.5, with performance comparable to models 5x larger. Our latest model, the 2.7 billion parameter Phi-2, surpasses Phi-1.5 performance on all benchmarks, thanks to new innovations in model scaling and training data curation. In this talk, I will introduce Phi SLMs and discuss two key insights driving their performance: 1) generation and utilization of data with "textbook quality" to elevate the learning process in contrast to conventional web data, and 2) incorporation of best practices for scaling up to enhance overall performance.

Chat is not available.