Poster
in
Workshop: The First Workshop on Efficient Reasoning

Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Sean McLeish · Leon Li · John Kirchenbauer · Dayal Singh Kalra · Brian Bartoldson · Bhavya Kailkhura · Avi Schwarzschild · Jonas Geiping · Micah Goldblum · Tom Goldstein

Project Page [ OpenReview]

Abstract

Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute.In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models.We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost.In our experiments on grade-school math, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.

Chat is not available.