NeurIPS WAMP: A Competition-Level Dataset for Assessing the Mathematical Reasoning Capabilities of LLMs

Poster
in
Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI

WAMP: A Competition-Level Dataset for Assessing the Mathematical Reasoning Capabilities of LLMs

Yujun Mao · Yoon Kim · Yilun Zhou

Keywords: [ LLM Math Reasoning ] [ Math Dataset ] [ LLM Evaluation ]

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

Recent large language models (LLMs) have shown indications of nontrivial mathematical reasoning ability. However, it is unclear how, especially for challenging math problems, the reasoning procedures are generated inside a model. In this paper, we propose a challenging dataset for better understanding the mathematical reasoning capabilities of LLMs. The Web of Annotated Math Problems, or WAMP, consists of competition-level math problems annotated with ''knowledge pieces'', or general math facts, and ''hints'', or problem-specific tricks. This ''web'' of problems is structured such that each problem is represented by a node in the graph, and is connected to its associated nodes of knowledge pieces and hints. In addition to graph construction, we manually annotate step-wise solutions for each problem. As an initial step, we apply this dataset to investigate whether augmenting the problem with relevant knowledge pieces and hints can improve model performance.

Chat is not available.

Poster in Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI

WAMP: A Competition-Level Dataset for Assessing the Mathematical Reasoning Capabilities of LLMs

Yujun Mao · Yoon Kim · Yilun Zhou

Poster
in
Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI