Timezone: »

On Training Implicit Models
Zhengyang Geng · Xin-Yu Zhang · Shaojie Bai · Yisen Wang · Zhouchen Lin

Fri Dec 10 08:30 AM -- 10:00 AM (PST) @
This paper focuses on training implicit models of infinite layers. Specifically, previous works employ implicit differentiation and solve the exact gradient for the backward propagation. However, is it necessary to compute such an exact but expensive gradient for training? In this work, we propose a novel gradient estimate for implicit models, named phantom gradient, that 1) forgoes the costly computation of the exact gradient; and 2) provides an update direction empirically preferable to the implicit model training. We theoretically analyze the condition under which an ascent direction of the loss landscape could be found and provide two specific instantiations of the phantom gradient based on the damped unrolling and Neumann series. Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 $\times$ and even boost the performance over approaches based on the exact gradient on ImageNet.

Author Information

Zhengyang Geng (Peking University)
Xin-Yu Zhang (TuSimple)

Xin-Yu Zhang is an algorithm researcher at TuSimple. His research interests mostly lie in machine learning and computer vision.

Shaojie Bai (Carnegie Mellon University)
Yisen Wang (Peking University)
Zhouchen Lin (Peking University)

More from the Same Authors