Skip to yearly menu bar Skip to main content

Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI

LLMs vs ITPs

Simon Frieder · Martin Trimmel · Rashid Alawadhi · Klaus Gy

Keywords: [ interactive theorem provers ] [ language models ] [ Evaluation ]


Previous work has shown that large language models (LLMs) can function as searchable mathematical knowledge bases, as during their pre/finetuning phase, they ingest a large number of mathematical books and articles. The only other searchable mathematical knowledge base of comparable size and range of mathematical difficulty levels are the libraries that are associated with interactive theorem provers (ITPs). This work is the first that compares these two forms of mathematical knowledge bases. Aside from intrinsic interest in such a comparison, such a comparison is necessary to lay the groundwork for the accelerating trend of merging LLM and ITP technology. We focus on GPT-4 as the LLM of choice and use Claude 2 as a fallback if GPT-4 cannot solve a task, and compare the combined mathematical knowledge bases of GPT-4 and Claude 2 to proof libraries associated with ITPs like Isabelle, Coq, and Lean. We release our dataset publicly.

Chat is not available.