NeurIPS Poster Multi-language Diversity Benefits Autoformalization

Poster

Multi-language Diversity Benefits Autoformalization

Albert Q. Jiang · Wenda Li · Mateja Jamnik

East Exhibit Hall A-C #2909

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence. Existing methods tend to circumvent this challenge by manually curating small corpora or using few-shot learning with large language models. But these methods suffer from data scarcity and formal language acquisition difficulty. In this work, we create mma, a large, flexible, multi-language, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. Experiments show that language models fine-tuned on mma can produce up to

29 - 31

$29-31$ \% of statements acceptable with minimal corrections on the miniF2F and ProofNet benchmarks, up from

0

$0$ \% with the base model. We demonstrate that fine-tuning on multi-language formal data results in more capable autoformalization models even on single-language tasks.

Chat is not available.