Workshop: OPT 2023: Optimization for Machine Learning

An alternative approach to train neural networks using monotone variational inequality

Chen Xu · Xiuyuan Cheng · Yao Xie

Abstract: We investigate an alternative approach to neural network training, which is a non-convex optimization problem, through the lens of another convex problem — to solve a monotone variational inequality (MVI) - inspired by the work of [Juditsky and Nemirovsky, 2019]. MVI solutions can be found by computationally efficient procedures, with performance guarantee of $\ell_2$ and $\ell_{\infty}$ bounds on model recovery and prediction accuracy under the theoretical setting of training a single-layer linear neural network. We study the use of MVI for training multi-layer neural networks by proposing a practical and completely general algorithm called \textit{stochastic variational inequality} (\texttt{SVI}). We demonstrate its applicability in training networks with various architectures (\texttt{SVI} is completely general for training any network). We show the competitive or better performance of \texttt{SVI} compared to the widely-used stochastic gradient descent method (SGD) on both synthetic and real data prediction tasks regarding various performance metrics, especially in the improved efficiency in the early stage of training.

