Timezone: »
Loss regularization can help reduce the gap between training and test error by systematically limiting model complexity. Popular regularization techniques such as L2 weight regularization act directly on the network parameters, but do not explicitly take into account how the interplay between the parameters and the network architecture may affect the induced predictive functions.To address this shortcoming, we propose a simple technique for effective function-space regularization. Drawing on the result that fully-trained wide multi-layer perceptrons are equivalent to kernel regression under the Neural Tangent Kernel (NTK), we propose to approximate the norm of neural network functions by the reproducing kernel Hilbert space norm under the NTK and use it as a function-space regularizer. We prove that neural networks trained using this regularizer are arbitrarily close to kernel ridge regression solutions under the NTK. Furthermore, we provide a generalization error bound under the proposed regularizer and empirically demonstrate improved generalization and state-of-the-art performance on downstream tasks where effective regularization on the induced space of functions is essential.
Author Information
Zonghao Chen (Tsinghua University, Tsinghua University)
Xupeng Shi (Northeastern University)
Tim G. J. Rudner (University of Oxford)
Tim G. J. Rudner is a Computer Science PhD student at the University of Oxford supervised by Yarin Gal and Yee Whye Teh. His research interests span Bayesian deep learning, reinforcement learning, and variational inference. He obtained a master’s degree in statistics from the University of Oxford and an undergraduate degree in mathematics and economics from Yale University. Tim is also a Rhodes Scholar and a Fellow of the German National Academic Foundation.
Qixuan Feng (University of Oxford)
Weizhong Zhang (The Hong Kong University of Science and Technology)
Tong Zhang (The Hong Kong University of Science and Technology)
More from the Same Authors
-
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : PCA Subspaces Are Not Always Optimal for Bayesian Learning »
Alexandre Bense · Amir Joudaki · Tim G. J. Rudner · Vincent Fortuin -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning »
Zachary Nado · Neil Band · Mark Collier · Josip Djolonga · Mike Dusenberry · Sebastian Farquhar · Qixuan Feng · Angelos Filos · Marton Havasi · Rodolphe Jenatton · Ghassen Jerfel · Jeremiah Liu · Zelda Mariet · Jeremy Nixon · Shreyas Padhy · Jie Ren · Tim G. J. Rudner · Yeming Wen · Florian Wenzel · Kevin Murphy · D. Sculley · Balaji Lakshminarayanan · Jasper Snoek · Yarin Gal · Dustin Tran -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Particle-based Variational Inference with Preconditioned Functional Gradient Flow »
Hanze Dong · Xi Wang · Yong Lin · Tong Zhang -
2022 : Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Poster Session 2 »
Jinwuk Seok · Bo Liu · Ryotaro Mitsuboshi · David Martinez-Rubio · Weiqiang Zheng · Ilgee Hong · Chen Fan · Kazusato Oko · Bo Tang · Miao Cheng · Aaron Defazio · Tim G. J. Rudner · Gabriele Farina · Vishwak Srinivasan · Ruichen Jiang · Peng Wang · Jane Lee · Nathan Wycoff · Nikhil Ghosh · Yinbin Han · David Mueller · Liu Yang · Amrutha Varshini Ramesh · Siqi Zhang · Kaifeng Lyu · David Yunis · Kumar Kshitij Patel · Fangshuo Liao · Dmitrii Avdiukhin · Xiang Li · Sattar Vakili · Jiaxin Shi -
2022 Poster: Tractable Function-Space Variational Inference in Bayesian Neural Networks »
Tim G. J. Rudner · Zonghao Chen · Yee Whye Teh · Yarin Gal -
2022 Poster: When is the Convergence Time of Langevin Algorithms Dimension Independent? A Composite Optimization Viewpoint »
Yoav S Freund · Yi-An Ma · Tong Zhang -
2022 Poster: Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity »
Alekh Agarwal · Tong Zhang -
2022 Poster: Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions »
Jiafan He · Dongruo Zhou · Tong Zhang · Quanquan Gu -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 Poster: Efficient Neural Network Training via Forward and Backward Propagation Sparsification »
Xiao Zhou · Weizhong Zhang · Zonghao Chen · SHIZHE DIAO · Tong Zhang -
2021 Poster: Outcome-Driven Reinforcement Learning via Variational Inference »
Tim G. J. Rudner · Vitchyr Pong · Rowan McAllister · Yarin Gal · Sergey Levine -
2021 Poster: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations »
Tim G. J. Rudner · Cong Lu · Michael A Osborne · Yarin Gal · Yee Teh -
2020 Poster: How to Characterize The Landscape of Overparameterized Convolutional Neural Networks »
Yihong Gu · Weizhong Zhang · Cong Fang · Jason Lee · Tong Zhang -
2019 : Poster session »
Sebastian Farquhar · Erik Daxberger · Andreas Look · Matt Benatan · Ruiyi Zhang · Marton Havasi · Fredrik Gustafsson · James A Brofos · Nabeel Seedat · Micha Livne · Ivan Ustyuzhaninov · Adam Cobb · Felix D McGregor · Patrick McClure · Tim R. Davidson · Gaurush Hiranandani · Sanjeev Arora · Masha Itkina · Didrik Nielsen · William Harvey · Matias Valdenegro-Toro · Stefano Peluchetti · Riccardo Moriconi · Tianyu Cui · Vaclav Smidl · Taylan Cemgil · Jack Fitzsimons · He Zhao · · mariana vargas vieyra · Apratim Bhattacharyya · Rahul Sharma · Geoffroy Dubourg-Felonneau · Jonathan Warrell · Slava Voloshynovskiy · Mihaela Rosca · Jiaming Song · Andrew Ross · Homa Fashandi · Ruiqi Gao · Hooshmand Shokri Razaghi · Joshua Chang · Zhenzhong Xiao · Vanessa Boehm · Giorgio Giannone · Ranganath Krishnan · Joe Davison · Arsenii Ashukha · Jeremiah Liu · Sicong (Sheldon) Huang · Evgenii Nikishin · Sunho Park · Nilesh Ahuja · Mahesh Subedar · · Artyom Gadetsky · Jhosimar Arias Figueroa · Tim G. J. Rudner · Waseem Aslam · Adrián Csiszárik · John Moberg · Ali Hebbal · Kathrin Grosse · Pekka Marttinen · Bang An · Hlynur Jónsson · Samuel Kessler · Abhishek Kumar · Mikhail Figurnov · Omesh Tickoo · Steindor Saemundsson · Ari Heljakka · Dániel Varga · Niklas Heim · Simone Rossi · Max Laves · Waseem Gharbieh · Nicholas Roberts · Luis Armando Pérez Rey · Matthew Willetts · Prithvijit Chakrabarty · Sumedh Ghaisas · Carl Shneider · Wray Buntine · Kamil Adamczewski · Xavier Gitiaux · Suwen Lin · Hao Fu · Gunnar Rätsch · Aidan Gomez · Erik Bodin · Dinh Phung · Lennart Svensson · Juliano Tusi Amaral Laganá Pinto · Milad Alizadeh · Jianzhun Du · Kevin Murphy · Beatrix Benkő · Shashaank Vattikuti · Jonathan Gordon · Christopher Kanan · Sontje Ihler · Darin Graham · Michael Teng · Louis Kirsch · Tomas Pevny · Taras Holotyak -
2019 Poster: VIREL: A Variational Inference Framework for Reinforcement Learning »
Mattie Fellows · Anuj Mahajan · Tim G. J. Rudner · Shimon Whiteson -
2019 Spotlight: VIREL: A Variational Inference Framework for Reinforcement Learning »
Mattie Fellows · Anuj Mahajan · Tim G. J. Rudner · Shimon Whiteson -
2018 Poster: Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning »
Xing Yan · Weizhong Zhang · Lin Ma · Wei Liu · Qi Wu -
2017 : Poster session »
Xun Zheng · Tim G. J. Rudner · Christopher Tegho · Patrick McClure · Yunhao Tang · ASHWIN D'CRUZ · Juan Camilo Gamboa Higuera · Chandra Sekhar Seelamantula · Jhosimar Arias Figueroa · Andrew Berlin · Maxime Voisin · Alexander Amini · Thang Long Doan · Hengyuan Hu · Aleksandar Botev · Niko Suenderhauf · CHI ZHANG · John Lambert -
2017 Poster: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Oral: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Poster: On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning »
Xingguo Li · Lin Yang · Jason Ge · Jarvis Haupt · Tong Zhang · Tuo Zhao