Skip to yearly menu bar Skip to main content


Think Big, Teach Small: Do Language Models Distil Occam’s Razor?

Gonzalo Jaimovitch-Lopez · David Castellano Falcón · Cesar Ferri · José Hernández-Orallo

Keywords: [ Machine Learning ] [ Interpretability ] [ Few Shot Learning ]


Large language models have recently shown a remarkable ability for few-shot learning, including patterns of algorithmic nature. However, it is still an open question to determine what kind of patterns these models can capture and how many examples they need in their prompts. We frame this question as a teaching problem with strong priors, and study whether language models can identify simple algorithmic concepts from small witness sets. In particular, we explore how several GPT architectures, program induction systems and humans perform in terms of the complexity of the concept and the number of additional examples, and how much their behaviour differs. This first joint analysis of language models and machine teaching can address key questions for artificial intelligence and machine learning, such as whether some strong priors, and Occam’s razor in particular, can be distilled from data, making learning from a few examples possible.

Chat is not available.