Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Surbhi Goel · Sushrut Karmalkar · Adam Klivans

East Exhibition Hall B, C #235

Keywords: [ Hardness of Learning and Approximations ] [ Theory ] [ Computational Complexity ]


Abstract: We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let \opt<1 be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt + \epsilon$ is as   hard as the problem of learning sparse parities with noise, widely thought   to be computationally intractable.  This is the first hardness   result for learning a ReLU with respect to Gaussian marginals, and   our results imply --{\em unconditionally}-- that gradient descent cannot   converge to the global minimum in polynomial time. \item There exists an efficient approximation algorithm for finding the   best-fitting ReLU that achieves error $O(\opt^{2/3})$.  The   algorithm uses a novel reduction to noisy halfspace learning with   respect to $0/1$ loss.  \end{itemize} Prior work due to Soltanolkotabi \cite{soltanolkotabi2017learning} showed that gradient descent {\em can} find the best-fitting ReLU with respect to Gaussian marginals, if the training set is {\em exactly} labeled by a ReLU.

Live content is unavailable. Log in and register to view live content