Skip to yearly menu bar Skip to main content

Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI

TinyGSM: achieving 80% on GSM8k with one billion parameters

Bingbin Liu · Sebastien Bubeck · Ronen Eldan · Janardhan Kulkarni · Yuanzhi Li · Anh Nguyen · Rachel Ward · Yi Zhang

Keywords: [ Reasoning ] [ Distillation ] [ math word problem ] [ GSM8K ] [ small language models ] [ verifier ]


Small models offer various computational advantages, yet the extent to which size is critical for problem-solving abilities remains an open question. This work studies the performance of small models on mathematical reasoning. Specifically, for solving math word problems, we find that a 1.3B model can achieve 80.1% accuracy on GSM8K, outperforming existing models that are orders of magnitude larger, and even rivaling the performance of the GPT-3.5-turbo teacher model from which the training data is generated. Our approach is simple and has two key components: The first is the use of a GPT-3.5-turbo-generated synthetic dataset of math word problem with solutions, which we will fully release. The second component is the use of a verifier, which selects the final outputs from multiple candidate generations.

Chat is not available.