Timezone: »

On Convexity and Linear Mode Connectivity in Neural Networks
David Yunis · Kumar Kshitij Patel · Pedro Savarese · Gal Vardi · Jonathan Frankle · Matthew Walter · Karen Livescu · Michael Maire

In many cases, neural networks trained with stochastic gradient descent (SGD) that share an early and often small portion of the training trajectory have solutions connected by a linear path of low loss. This phenomenon, called linear mode connectivity (LMC), has been leveraged for pruning and model averaging in large neural network models, but it is not well understood how broadly or why it occurs. LMC suggests that SGD trajectories somehow end up in a \textit{convex"} region of the loss landscape and stay there. In this work, we confirm that this eventually does happen by finding a high-dimensional convex hull of low loss between the endpoints of several SGD trajectories. But to our surprise, simple measures of convexity do not show any obvious transition at the point when SGD will converge into this region. To understand this convex hull better, we investigate the functional behaviors of its endpoints. We find that only a small number of correct predictions are shared between all endpoints of a hull, and an even smaller number of correct predictions are shared between the hulls, even when the final accuracy is high for every endpoint. Thus, we tie LMC more tightly to convexity, and raise several new questions about the source of this convexity in neural network optimization.