Timezone: »
Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.
Author Information
Tianyi Liu (Georgia Institute of Technolodgy)
Minshuo Chen (Georgia Tech)
Mo Zhou (Duke University)
Simon Du (Institute for Advanced Study)
Enlu Zhou (Georgia Institute of Technology)
Tuo Zhao (Gatech)
More from the Same Authors
-
2022 : RLCG: When Reinforcement Learning Meets Coarse Graining »
Shenghao Wu · Tianyi Liu · Zhirui Wang · Wen Yan · Yingxiang Yang -
2022 : Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2023 Poster: Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations »
Minshuo Chen · Yu Bai · H. Vincent Poor · Mengdi Wang -
2023 Poster: Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms »
Shenao Zhang · Boyi Liu · Zhaoran Wang · Tuo Zhao -
2023 Poster: Bayesian Risk-Averse Q-Learning with Streaming Observations »
Yuhao Wang · Enlu Zhou -
2023 Poster: Module-wise Adaptive Distillation for Multimodality Foundation Models »
Chen Liang · Jiahui Yu · Ming-Hsuan Yang · Matthew Brown · Yin Cui · Tuo Zhao · Boqing Gong · Tianyi Zhou -
2023 Poster: Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms »
Alexander Bukharin · Yan Li · Yue Yu · Qingru Zhang · Zhehui Chen · Simiao Zuo · Chao Zhang · Songan Zhang · Tuo Zhao -
2023 Poster: Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement »
Hui Yuan · Kaixuan Huang · Chengzhuo Ni · Minshuo Chen · Mengdi Wang -
2022 Poster: Bayesian Risk Markov Decision Processes »
Yifan Lin · Yuxuan Ren · Enlu Zhou -
2022 Poster: On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds »
Biraj Dahal · Alexander Havrilla · Minshuo Chen · Tuo Zhao · Wenjing Liao -
2021 Poster: Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL »
Minshuo Chen · Yan Li · Ethan Wang · Zhuoran Yang · Zhaoran Wang · Tuo Zhao -
2021 Poster: Understanding Deflation Process in Over-parametrized Tensor Decomposition »
Rong Ge · Yunwei Ren · Xiang Wang · Mo Zhou -
2020 Session: Orals & Spotlights Track 34: Deep Learning »
Tuo Zhao · Jimmy Ba -
2020 Poster: Bayesian Optimization of Risk Measures »
Sait Cakmak · Raul Astudillo · Peter Frazier · Enlu Zhou -
2020 Poster: Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality »
Yi Zhang · Orestis Plevrakis · Simon Du · Xingguo Li · Zhao Song · Sanjeev Arora -
2020 Poster: Differentiable Top-k with Optimal Transport »
Yujia Xie · Hanjun Dai · Minshuo Chen · Bo Dai · Tuo Zhao · Hongyuan Zha · Wei Wei · Tomas Pfister -
2020 Poster: Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective »
Kaixuan Huang · Yuqing Wang · Molei Tao · Tuo Zhao -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity »
Simon Du · Jason Lee · Gaurav Mahajan · Ruosong Wang -
2020 Poster: On Reward-Free Reinforcement Learning with Linear Function Approximation »
Ruosong Wang · Simon Du · Lin Yang · Russ Salakhutdinov -
2020 Poster: Towards Understanding Hierarchical Learning: Benefits of Neural Representations »
Minshuo Chen · Yu Bai · Jason Lee · Tuo Zhao · Huan Wang · Caiming Xiong · Richard Socher -
2020 Poster: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2020 Spotlight: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Late-Breaking Papers (Talks) »
David Silver · Simon Du · Matthias Plappert -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels »
Simon Du · Kangcheng Hou · Russ Salakhutdinov · Barnabas Poczos · Ruosong Wang · Keyulu Xu -
2019 Poster: Acceleration via Symplectic Discretization of High-Resolution Differential Equations »
Bin Shi · Simon Du · Weijie Su · Michael Jordan -
2019 Poster: Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle »
Simon Du · Yuping Luo · Ruosong Wang · Hanrui Zhang -
2019 Poster: Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds »
Minshuo Chen · Haoming Jiang · Wenjing Liao · Tuo Zhao -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2018 Poster: Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization »
Minshuo Chen · Lin Yang · Mengdi Wang · Tuo Zhao -
2018 Poster: Provable Gaussian Embedding with One Observation »
Ming Yu · Zhuoran Yang · Tuo Zhao · Mladen Kolar · Zhaoran Wang -
2018 Poster: Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization »
Tianyi Liu · Shiyang Li · Jianping Shi · Enlu Zhou · Tuo Zhao