Skip to yearly menu bar Skip to main content


Poster

How regularization affects the critical points in linear networks

Amirhossein Taghvaei · Jin W Kim · Prashant Mehta

Pacific Ballroom #134

Keywords: [ Control Theory ] [ Optimization for Deep Networks ] [ Non-Convex Optimization ]


Abstract:

This paper is concerned with the problem of representing and learning a linear transformation using a linear neural network. In recent years, there is a growing interest in the study of such networks, in part due to the successes of deep learning. The main question of this body of research (and also of our paper) is related to the existence and optimality properties of the critical points of the mean-squared loss function. An additional primary concern of our paper pertains to the robustness of these critical points in the face of (a small amount of) regularization. An optimal control model is introduced for this purpose and a learning algorithm (backprop with weight decay) derived for the same using the Hamilton's formulation of optimal control. The formulation is used to provide a complete characterization of the critical points in terms of the solutions of a nonlinear matrix-valued equation, referred to as the characteristic equation. Analytical and numerical tools from bifurcation theory are used to compute the critical points via the solutions of the characteristic equation.

Live content is unavailable. Log in and register to view live content