Skip to yearly menu bar Skip to main content


Poster
in
Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI

WAMP: A Competition-Level Dataset for Assessing the Mathematical Reasoning Capabilities of LLMs

Yujun Mao · Yoon Kim · Yilun Zhou

Keywords: [ LLM Evaluation ] [ Math Dataset ] [ LLM Math Reasoning ]


Abstract:

Recent large language models (LLMs) have shown indications of nontrivial mathematical reasoning ability. However, it is unclear how, especially for challenging math problems, the reasoning procedures are generated inside a model. In this paper, we propose a challenging dataset for better understanding the mathematical reasoning capabilities of LLMs. The Web of Annotated Math Problems, or WAMP, consists of competition-level math problems annotated with ''knowledge pieces'', or general math facts, and ''hints'', or problem-specific tricks. This ''web'' of problems is structured such that each problem is represented by a node in the graph, and is connected to its associated nodes of knowledge pieces and hints. In addition to graph construction, we manually annotate step-wise solutions for each problem. As an initial step, we apply this dataset to investigate whether augmenting the problem with relevant knowledge pieces and hints can improve model performance.

Chat is not available.