Skip to yearly menu bar Skip to main content

Workshop: Mathematics of Modern Machine Learning (M3L)

Gibbs-Based Information Criteria and the Over-Parameterized Regime

Haobo Chen · Yuheng Bu · Gregory Wornell

Abstract: Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Bayesian Information Criterion (BIC) for models trained by the Gibbs algorithm. Notably, the BIC penalty term for the Gibbs algorithm corresponds to a specific information measure, i.e., KL divergence. We extend this information-theoretic analysis to over-parameterized models by characterizing the Gibbs-based BIC for the random feature model in the regime where the number of parameters $p$ and the number of samples $n$ tend to infinity, with $p/n$ fixed. Our experiments demonstrate that the Gibbs-based BIC can select the high-dimensional model and reveal the mismatch between marginal likelihood and population risk in the over-parameterized regime, providing new insights for understanding the double-descent phenomenon.

Chat is not available.