Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI

WAMP: A Competition-Level Dataset for Assessing the Mathematical Reasoning Capabilities of LLMs

Yujun Mao · Yoon Kim · Yilun Zhou

Keywords: [ LLM Math Reasoning ] [ Math Dataset ] [ LLM Evaluation ]


Recent large language models (LLMs) have shown indications of nontrivial mathematical reasoning ability. However, it is unclear how, especially for challenging math problems, the reasoning procedures are generated inside a model. In this paper, we propose a challenging dataset for better understanding the mathematical reasoning capabilities of LLMs. The Web of Annotated Math Problems, or WAMP, consists of competition-level math problems annotated with ''knowledge pieces'', or general math facts, and ''hints'', or problem-specific tricks. This ''web'' of problems is structured such that each problem is represented by a node in the graph, and is connected to its associated nodes of knowledge pieces and hints. In addition to graph construction, we manually annotate step-wise solutions for each problem. As an initial step, we apply this dataset to investigate whether augmenting the problem with relevant knowledge pieces and hints can improve model performance.

