Skip to yearly menu bar Skip to main content

Workshop: Trustworthy and Socially Responsible Machine Learning

Group Excess Risk Bound of Overparameterized Linear Regression with Constant-Stepsize SGD

Arjun Subramonian · Levent Sagun · Kai-Wei Chang · Yizhou Sun


It has been observed that machine learning models trained using stochastic gradient descent (SGD) exhibit poor generalization to certain groups within and outside the population from which training instances are sampled. This has serious ramifications for the fairness, privacy, robustness, and out-of-distribution (OOD) generalization of machine learning. Hence, we theoretically characterize the inherent generalization of SGD-learned overparameterized linear regression to intra- and extra-population groups. We do this by proving an excess risk bound for an arbitrary group in terms of the full eigenspectra of the data covariance matrices of the group and population. We additionally provide a novel interpretation of the bound in terms of how the group and population data distributions differ and the effective dimension of SGD, as well as connect these factors to real-world challenges in practicing trustworthy machine learning. We further empirically validate the tightness of our bound on simulated data.

Chat is not available.