Timezone: »

 
A Neural Tangent Kernel Perspective on Function-Space Regularization in Neural Networks
Zonghao Chen · Xupeng Shi · Tim G. J. Rudner · Qixuan Feng · Weizhong Zhang · Tong Zhang
Event URL: https://openreview.net/forum?id=E6MGIXQlKw »

Loss regularization can help reduce the gap between training and test error by systematically limiting model complexity. Popular regularization techniques such as L2 weight regularization act directly on the network parameters, but do not explicitly take into account how the interplay between the parameters and the network architecture may affect the induced predictive functions.To address this shortcoming, we propose a simple technique for effective function-space regularization. Drawing on the result that fully-trained wide multi-layer perceptrons are equivalent to kernel regression under the Neural Tangent Kernel (NTK), we propose to approximate the norm of neural network functions by the reproducing kernel Hilbert space norm under the NTK and use it as a function-space regularizer. We prove that neural networks trained using this regularizer are arbitrarily close to kernel ridge regression solutions under the NTK. Furthermore, we provide a generalization error bound under the proposed regularizer and empirically demonstrate improved generalization and state-of-the-art performance on downstream tasks where effective regularization on the induced space of functions is essential.

Author Information

Zonghao Chen (Tsinghua University, Tsinghua University)
Xupeng Shi (Northeastern University)
Tim G. J. Rudner (University of Oxford)

Tim G. J. Rudner is a Computer Science PhD student at the University of Oxford supervised by Yarin Gal and Yee Whye Teh. His research interests span Bayesian deep learning, reinforcement learning, and variational inference. He obtained a master’s degree in statistics from the University of Oxford and an undergraduate degree in mathematics and economics from Yale University. Tim is also a Rhodes Scholar and a Fellow of the German National Academic Foundation.

Qixuan Feng (University of Oxford)
Weizhong Zhang (The Hong Kong University of Science and Technology)
Tong Zhang (The Hong Kong University of Science and Technology)

More from the Same Authors