Reversal Is Structural: Concept-Aware Post-Training Recovers Rare, Deep Mathematical Skills
Abstract
Solution-based post-training, ranging from self-taught rationales to iterative preference learning, is credited with improving mathematical problem solving in large language models. Recent evidence, however, shows a "self-improvement reversal": as pass@1 rises, models can lose breadth and robustness. We argue this is structural, not a statistical quirk. Drawing on Knowledge Space Theory, we recast mathematical reasoning as movement over a prerequisite concept graph and induce a sparse problem-concept mapping via an automatic pipeline, AutoKST, enabling concept-aware diagnostics beyond accuracy.Applied to challenging math benchmarks, AutoKST reveals that regressions localize to the graph’s fringe, rare, prerequisite-heavy skills that headline scores overlook. A linearized view of post-training explains why frequency-skewed updates with coupled gradients naturally drift away from these low-frequency directions.Guided by this account, we propose Fringe-Theorem Training (FTT): a lightweight regimen that combines frequency-aware loss reweighting, projection-based gradient safeguards, and a fringe-focused micro-curriculum. In controlled studies, FTT improves pass@1 while restoring fringe competence and prerequisite adherence; for example, versus STaR it raises fringe performance and coverage substantially, improves consistency, lowers calibration error, and, paired with early stopping, reduces reasoning tokens. By turning post-training evaluation into concept-level measurement, our framework distinguishes genuine self-improvement from structural regress and offers a practical path to the former.