Timezone: »

Invited Talk: Roger Grosse - Why Isn’t Everyone Using Second-Order Optimization?
Roger Grosse

In the pre-AlexNet days of deep learning, second-order optimization gave dramatic speedups and enabled training of deep architectures that seemed to be inaccessible to first-order optimization. But today, despite algorithmic advances such as K-FAC, nearly all modern neural net architectures are trained with variants of SGD and Adam. What’s holding us back from using second-order optimization? I’ll discuss three challenges to applying second-order optimization to modern neural nets: difficulty of implementation, implicit regularization effects of gradient descent, and the effect of gradient noise. All of these factors are significant, though not in the ways commonly believed.

Author Information

Roger Grosse (University of Toronto)

More from the Same Authors