Timezone: »
A recent breakthrough in deep learning theory shows that the training of over-parameterized deep neural networks can be characterized by a kernel function called \textit{neural tangent kernel} (NTK). However, it is known that this type of results does not perfectly match the practice, as NTK-based analysis requires the network weights to stay very close to their initialization throughout training, and cannot handle regularizers or gradient noises. In this paper, we provide a generalized neural tangent kernel analysis and show that noisy gradient descent with weight decay can still exhibit a ``kernel-like'' behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
Author Information
Zixiang Chen (UCLA)
Yuan Cao (UCLA)
Quanquan Gu (UCLA)
Tong Zhang (The Hong Kong University of Science and Technology)
More from the Same Authors
-
2021 : Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization »
Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu -
2021 : Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization »
Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu -
2021 : Faster Perturbed Stochastic Gradient Methods for Finding Local Minima »
Zixiang Chen · Dongruo Zhou · Quanquan Gu -
2021 : Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium »
Chris Junchi Li · Dongruo Zhou · Quanquan Gu · Michael Jordan -
2022 : A Neural Tangent Kernel Perspective on Function-Space Regularization in Neural Networks »
Zonghao Chen · Xupeng Shi · Tim G. J. Rudner · Qixuan Feng · Weizhong Zhang · Tong Zhang -
2022 : Particle-based Variational Inference with Preconditioned Functional Gradient Flow »
Hanze Dong · Xi Wang · Yong Lin · Tong Zhang -
2022 : Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 : A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning »
Zixiang Chen · Chris Junchi Li · Angela Yuan · Quanquan Gu · Michael Jordan -
2022 Panel: Panel 6B-2: Depth is More… & Benign Overfitting in… »
Yuan Cao · Shen-Huan Lyu -
2022 Spotlight: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 : Contributed Talks 2 »
Quanquan Gu · Aaron Defazio · Jiajin Li -
2022 Workshop: OPT 2022: Optimization for Machine Learning »
Courtney Paquette · Sebastian Stich · Quanquan Gu · Cristóbal Guzmán · John Duchi -
2022 Poster: Towards Understanding the Mixture-of-Experts Layer in Deep Learning »
Zixiang Chen · Yihe Deng · Yue Wu · Quanquan Gu · Yuanzhi Li -
2022 Poster: Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs »
Dongruo Zhou · Quanquan Gu -
2022 Poster: Benign Overfitting in Two-layer Convolutional Neural Networks »
Yuan Cao · Zixiang Chen · Misha Belkin · Quanquan Gu -
2022 Poster: Learning Two-Player Markov Games: Neural Function Approximation and Correlated Equilibrium »
Chris Junchi Li · Dongruo Zhou · Quanquan Gu · Michael Jordan -
2022 Poster: A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits »
Jiafan He · Tianhao Wang · Yifei Min · Quanquan Gu -
2022 Poster: When is the Convergence Time of Langevin Algorithms Dimension Independent? A Composite Optimization Viewpoint »
Yoav S Freund · Yi-An Ma · Tong Zhang -
2022 Poster: The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity »
Alekh Agarwal · Tong Zhang -
2022 Poster: Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions »
Jiafan He · Dongruo Zhou · Tong Zhang · Quanquan Gu -
2022 Poster: Active Ranking without Strong Stochastic Transitivity »
Hao Lou · Tao Jin · Yue Wu · Pan Xu · Quanquan Gu · Farzad Farnoud -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : Contributed talks in Session 4 (Zoom) »
Quanquan Gu · Agnieszka Słowik · Jacques Chen · Neha Wadia · Difan Zou -
2021 : Opening Remarks to Session 4 »
Quanquan Gu -
2021 : Poster Session 2 (gather.town) »
Wenjie Li · Akhilesh Soni · Jinwuk Seok · Jianhao Ma · Jeffery Kline · Mathieu Tuli · Miaolan Xie · Robert Gower · Quanqi Hu · Matteo Cacciola · Yuanlu Bai · Boyue Li · Wenhao Zhan · Shentong Mo · Junhyung Lyle Kim · Sajad Fathi Hafshejani · Chris Junchi Li · Zhishuai Guo · Harshvardhan Harshvardhan · Neha Wadia · Tatjana Chavdarova · Difan Zou · Zixiang Chen · Aman Gupta · Jacques Chen · Betty Shea · Benoit Dherin · Aleksandr Beznosikov -
2021 Workshop: OPT 2021: Optimization for Machine Learning »
Courtney Paquette · Quanquan Gu · Oliver Hinder · Katya Scheinberg · Sebastian Stich · Martin Takac -
2021 Poster: A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning »
Christoph Dann · Mehryar Mohri · Tong Zhang · Julian Zimmert -
2021 Poster: The Benefits of Implicit Regularization from SGD in Least Squares Problems »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Dean Foster · Sham Kakade -
2021 Poster: Efficient Neural Network Training via Forward and Backward Propagation Sparsification »
Xiao Zhou · Weizhong Zhang · Zonghao Chen · SHIZHE DIAO · Tong Zhang -
2021 Poster: Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Poster: Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent »
Spencer Frei · Quanquan Gu -
2021 Poster: Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures »
Yuan Cao · Quanquan Gu · Mikhail Belkin -
2021 Poster: Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Poster: Error Compensated Distributed SGD Can Be Accelerated »
Xun Qian · Peter Richtarik · Tong Zhang -
2021 Poster: Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation »
Weitong ZHANG · Dongruo Zhou · Quanquan Gu -
2021 Poster: Variance-Aware Off-Policy Evaluation with Linear Function Approximation »
Yifei Min · Tianhao Wang · Dongruo Zhou · Quanquan Gu -
2021 Poster: Iterative Teacher-Aware Learning »
Luyao Yuan · Dongruo Zhou · Junhong Shen · Jingdong Gao · Jeffrey L Chen · Quanquan Gu · Ying Nian Wu · Song-Chun Zhu -
2021 Poster: Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints »
Tianhao Wang · Dongruo Zhou · Quanquan Gu -
2021 Poster: Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks »
Hanxun Huang · Yisen Wang · Sarah Erfani · Quanquan Gu · James Bailey · Xingjun Ma -
2021 Poster: Do Wider Neural Networks Really Help Adversarial Robustness? »
Boxi Wu · Jinghui Chen · Deng Cai · Xiaofei He · Quanquan Gu -
2021 Poster: Pure Exploration in Kernel and Neural Bandits »
Yinglun Zhu · Dongruo Zhou · Ruoxi Jiang · Quanquan Gu · Rebecca Willett · Robert Nowak -
2020 : Closing remarks »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 : Contributed talks in Session 4 (Zoom) »
Quanquan Gu · sanae lotfi · Charles Guille-Escuret · Tolga Ergen · Dongruo Zhou -
2020 : Live Q&A with Deanna Needell and Hanbake Lyu (Zoom) »
Quanquan Gu -
2020 : Welcome remarks to Session 4 »
Quanquan Gu -
2020 : Invited speaker: The Convexity of Learning Infinite-width Deep Neural Networks, Tong Zhang »
Tong Zhang -
2020 Workshop: OPT2020: Optimization for Machine Learning »
Courtney Paquette · Mark Schmidt · Sebastian Stich · Quanquan Gu · Martin Takac -
2020 : Welcome event (gather.town) »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 Poster: Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets »
Kai Han · Yunhe Wang · Qiulin Zhang · Wei Zhang · Chunjing XU · Tong Zhang -
2020 Poster: Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts »
Guilin Li · Junlei Zhang · Yunhe Wang · Chuanjian Liu · Matthias Tan · Yunfeng Lin · Wei Zhang · Jiashi Feng · Tong Zhang -
2020 Poster: Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems »
Luo Luo · Haishan Ye · Zhichao Huang · Tong Zhang -
2020 Poster: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS »
Han Shi · Renjie Pi · Hang Xu · Zhenguo Li · James Kwok · Tong Zhang -
2020 Poster: Decentralized Accelerated Proximal Gradient Descent »
Haishan Ye · Ziang Zhou · Luo Luo · Tong Zhang -
2020 Poster: Agnostic Learning of a Single Neuron with Gradient Descent »
Spencer Frei · Yuan Cao · Quanquan Gu -
2020 Poster: A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods »
Yue Wu · Weitong ZHANG · Pan Xu · Quanquan Gu -
2020 Poster: How to Characterize The Landscape of Overparameterized Convolutional Neural Networks »
Yihong Gu · Weizhong Zhang · Cong Fang · Jason Lee · Tong Zhang -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 Poster: Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks »
Spencer Frei · Yuan Cao · Quanquan Gu -
2019 Poster: Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks »
Difan Zou · Ziniu Hu · Yewen Wang · Song Jiang · Yizhou Sun · Quanquan Gu -
2019 Poster: Divergence-Augmented Policy Optimization »
Qing Wang · Yingru Li · Jiechao Xiong · Tong Zhang -
2019 Poster: Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction »
Difan Zou · Pan Xu · Quanquan Gu -
2019 Poster: Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks »
Yuan Cao · Quanquan Gu -
2019 Poster: An Improved Analysis of Training Over-parameterized Deep Neural Networks »
Difan Zou · Quanquan Gu -
2019 Poster: Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks »
Yuan Cao · Quanquan Gu -
2019 Spotlight: Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks »
Yuan Cao · Quanquan Gu -
2018 Poster: Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima »
Yaodong Yu · Pan Xu · Quanquan Gu -
2018 Poster: Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization »
Pan Xu · Jinghui Chen · Difan Zou · Quanquan Gu -
2018 Poster: Communication Compression for Decentralized Training »
Hanlin Tang · Shaoduo Gan · Ce Zhang · Tong Zhang · Ji Liu -
2018 Spotlight: Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization »
Pan Xu · Jinghui Chen · Difan Zou · Quanquan Gu -
2018 Poster: SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator »
Cong Fang · Chris Junchi Li · Zhouchen Lin · Tong Zhang -
2018 Poster: Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Spotlight: SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator »
Cong Fang · Chris Junchi Li · Zhouchen Lin · Tong Zhang -
2018 Spotlight: Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Poster: Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity »
Conghui Tan · Tong Zhang · Shiqian Ma · Ji Liu -
2018 Poster: Exponentially Weighted Imitation Learning for Batched Historical Data »
Qing Wang · Jiechao Xiong · Lei Han · peng sun · Han Liu · Tong Zhang -
2018 Poster: Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization »
Bargav Jayaraman · Lingxiao Wang · David Evans · Quanquan Gu -
2018 Poster: Gradient Sparsification for Communication-Efficient Distributed Optimization »
Jianqiao Wangni · Jialei Wang · Ji Liu · Tong Zhang -
2017 Poster: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Poster: Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization »
Pan Xu · Jian Ma · Quanquan Gu -
2017 Oral: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Poster: On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning »
Xingguo Li · Lin Yang · Jason Ge · Jarvis Haupt · Tong Zhang · Tuo Zhao -
2016 Poster: Exact Recovery of Hard Thresholding Pursuit »
Xiaotong Yuan · Ping Li · Tong Zhang -
2016 Poster: Semiparametric Differential Graph Models »
Pan Xu · Quanquan Gu -
2016 Poster: Learning Additive Exponential Family Graphical Models via $\ell_{2,1}$-norm Regularized M-Estimation »
Xiaotong Yuan · Ping Li · Tong Zhang · Qingshan Liu · Guangcan Liu -
2015 Poster: Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling »
Zheng Qu · Peter Richtarik · Tong Zhang -
2015 Poster: Local Smoothness in Variance Reduced Optimization »
Daniel Vainsencher · Han Liu · Tong Zhang -
2015 Poster: High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality »
Zhaoran Wang · Quanquan Gu · Yang Ning · Han Liu -
2015 Poster: Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding »
Rie Johnson · Tong Zhang -
2015 Spotlight: Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding »
Rie Johnson · Tong Zhang -
2014 Poster: Sparse PCA with Oracle Property »
Quanquan Gu · Zhaoran Wang · Han Liu -
2014 Poster: Robust Tensor Decomposition with Gross Corruption »
Quanquan Gu · Huan Gui · Jiawei Han -
2013 Poster: Accelerating Stochastic Gradient Descent using Predictive Variance Reduction »
Rie Johnson · Tong Zhang -
2013 Poster: Accelerated Mini-Batch Stochastic Dual Coordinate Ascent »
Shai Shalev-Shwartz · Tong Zhang -
2012 Workshop: Modern Nonparametric Methods in Machine Learning »
Sivaraman Balakrishnan · Arthur Gretton · Mladen Kolar · John Lafferty · Han Liu · Tong Zhang -
2012 Poster: Selective Labeling via Error Bound Minimization »
Quanquan Gu · Tong Zhang · Chris Ding · Jiawei Han -
2011 Poster: Learning to Search Efficiently in High Dimensions »
Zhen Li · Huazhong Ning · Liangliang Cao · Tong Zhang · Yihong Gong · Thomas S Huang -
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang -
2011 Poster: Greedy Model Averaging »
Dong Dai · Tong Zhang -
2010 Poster: Deep Coding Network »
Yuanqing Lin · Tong Zhang · Shenghuo Zhu · Kai Yu -
2010 Poster: Agnostic Active Learning Without Constraints »
Alina Beygelzimer · Daniel Hsu · John Langford · Tong Zhang -
2009 Poster: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2009 Poster: Nonlinear Learning using Local Coordinate Coding »
Kai Yu · Tong Zhang · Yihong Gong -
2009 Oral: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2008 Poster: Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models »
Tong Zhang -
2008 Oral: Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models »
Tong Zhang -
2008 Poster: Sparse Online Learning via Truncated Gradient »
John Langford · Lihong Li · Tong Zhang -
2008 Spotlight: Sparse Online Learning via Truncated Gradient »
John Langford · Lihong Li · Tong Zhang -
2008 Poster: Multi-stage Convex Relaxation for Learning with Sparse Regularization »
Tong Zhang -
2007 Poster: A General Boosting Method and its Application to Learning Ranking Functions for Web Search »
Zhaohui Zheng · Hongyuan Zha · Tong Zhang · Olivier Chapelle · Keke Chen · Gordon Sun -
2007 Poster: The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information »
John Langford · Tong Zhang -
2006 Poster: Learning on Graph with Laplacian Regularization »
Rie Ando · Tong Zhang