Timezone: »

Implicit Sparse Regularization: The Impact of Depth and Early Stopping
Jiangyuan Li · Thanh Nguyen · Chinmay Hegde · Raymond K. W. Wong

Wed Dec 08 12:30 AM -- 02:00 AM (PST) @
In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-$N$ networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon that we call \emph{implicit sparse regularization}. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stopping and show that for a general depth parameter $N$, gradient descent with early stopping achieves minimax optimal sparse recovery with sufficiently small initialization $w_0$ and step size $\eta$. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stopping window so that this implicit sparse regularization effect is more likely to take place.

Author Information

Jiangyuan Li (Texas A&M University)
Thanh Nguyen (AWS)
Chinmay Hegde (New York University)
Raymond K. W. Wong (Texas A&M University)

More from the Same Authors