Timezone: »
Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is natural to ask whether LMs can generate their own instructive programming problems to improve their performance. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM’s performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions; thus the model “improves itself” using the Python interpreter. Problems are specified formally as programming puzzles [Schuster et al., 2021], a code-based problem format where solutions can easily be verified for correctness by execution. In experiments on publicly-available LMs, test accuracy more than doubles. This RL approach demonstrates the potential for code LMs, with an interpreter, to generate instructive problems and improve their own performance.
Author Information
Patrick Haluptzok
Matthew Bowers (Massachusetts Institute of Technology)

PhD student at MIT co-advised by Armando Solar-Lezama in EECS and Josh Tenenbaum in BCS. Research in combining methods from programming languages (PL) research with machine learning to tackle problems in artificial intelligence.
Adam Kalai (Microsoft Research New England (-(-_(-_-)_-)-))
More from the Same Authors
-
2021 : Programming Puzzles »
Tal Schuster · Ashwin Kalyan · Alex Polozov · Adam Kalai -
2021 Spotlight: Towards optimally abstaining from prediction with OOD test examples »
Adam Kalai · Varun Kanade -
2022 Spotlight: Are GANs overkill for NLP? »
David Alvarez-Melis · Vikas Garg · Adam Kalai -
2022 : A Theory of Unsupervised Translation for Understanding Animal Communication »
Shafi Goldwasser · David Gruber · Adam Kalai · Orr Paradise -
2022 Poster: Are GANs overkill for NLP? »
David Alvarez-Melis · Vikas Garg · Adam Kalai -
2022 Poster: Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms »
Surbhi Goel · Sham Kakade · Adam Kalai · Cyril Zhang -
2021 : Programming Puzzles »
Tal Schuster · Ashwin Kalyan · Alex Polozov · Adam Kalai -
2021 Poster: Towards optimally abstaining from prediction with OOD test examples »
Adam Kalai · Varun Kanade -
2018 Poster: Supervising Unsupervised Learning »
Vikas Garg · Adam Kalai -
2018 Spotlight: Supervising Unsupervised Learning »
Vikas Garg · Adam Kalai -
2011 Poster: Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression »
Sham M Kakade · Adam Kalai · Varun Kanade · Ohad Shamir -
2009 Poster: Potential-Based Agnostic Boosting »
Adam Kalai · Varun Kanade -
2006 Demonstration: Intelligent Ink Processing »
Sashi Raghupathy · Qi Zhang · Patrick Haluptzok