The Effect of Dataset Diversification on Mathematical Problem Solving Performance
Abstract
We investigate the impact of dataset diversification on mathematical problem-solving performance and find that diversity control can substantially improve model capabilities. Using Farthest Point Sampling across different diversity levels, we observe performance gains of 12.7 percentage points on GSM8K and 12.4 percentage points on MATH benchmark, with 25\% diversity performing optimally across both tasks. Our evaluation reveals that the relationship between diversity level and performance is non-monotonic, with intermediate diversity levels outperforming both random sampling and maximum diversity approaches. Through experiments on NuminaMath, Hendrycks MATH, and MATH-Plus datasets, we demonstrate that these improvements depend on training set size: while 1k examples show minimal diversity benefits, 3k examples exhibit substantial gains. We also find that diverse sampling can harm performance by selecting low-quality examples from noisy datasets, highlighting the importance of quality control in diversification strategies.