Loading [MathJax]/jax/output/CommonHTML/jax.js
Skip to yearly menu bar Skip to main content


Poster

On the number of variables to use in principal component regression

Ji Xu · Daniel Hsu

East Exhibition Hall B, C #234

Keywords: [ Frequentist Statistics ] [ Theory ] [ Regularization ] [ Theory -> Large Deviations and Asymptotic Analysis; Theory ]


Abstract: We study least squares linear regression over N uncorrelated Gaussian features that are selected in order of decreasing variance. When the number of selected features p is at most the sample size n, the estimator under consideration coincides with the principal component regression estimator; when p>n, the estimator is the least 2 norm solution over the selected features. We give an average-case analysis of the out-of-sample prediction error as p,n,N with p/Nα and n/Nβ, for some constants α[0,1] and β(0,1). In this average-case setting, the prediction error exhibits a double descent'' shape as a function of p. We also establish conditions under which the minimum risk is achieved in the interpolating (p>n) regime.

Live content is unavailable. Log in and register to view live content