Timezone: »
Poster
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
Jiawei Zhang · Yushun Zhang · Mingyi Hong · Ruoyu Sun · Zhi-Quan Luo
Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than $n$ (sample size) neurons when the activation is smooth. First, we prove that as long as the width $m \geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. Second, we identify a nice local region with no local-min or saddle points. Nevertheless, it is not clear whether gradient descent can stay in this nice region. Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer. It is expected that projected gradient methods converge to KKT points under mild technical conditions, but we leave the rigorous convergence analysis to future work. Thorough numerical results show that projected gradient methods on this constrained formulation significantly outperform SGD for training narrow neural nets.
Author Information
Jiawei Zhang (The Chinese University of Hong Kong, Shenzhen)
Yushun Zhang (The Chinese University of Hong Kong, Shenzhen)
I am a Ph.D. student under the supervision of Prof. Tom Zhi-Quan Luo and Prof. Tong Zhang, I am interested in understanding deep learning.
Mingyi Hong (University of Minnesota)
Ruoyu Sun (University of Illinois at Urbana-Champaign)
Zhi-Quan Luo (University of Minnesota, Twin Cites)
More from the Same Authors
-
2021 : A Unified Framework to Understand Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective »
xinwei zhang · Mingyi Hong · Nicola Elia -
2022 : A Unified Framework to Understand Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective »
xinwei zhang · Nicola Elia · Mingyi Hong -
2022 : Building Large Machine Learning Models from Small Distributed Models: A Layer Matching Approach »
xinwei zhang · Bingqing Song · Mehrdad Honarkhah · Jie Ding · Mingyi Hong -
2022 : On the Robustness of deep learning-based MRI Reconstruction to image transformations »
jinghan jia · Mingyi Hong · Yimeng Zhang · Mehmet Akcakaya · Sijia Liu -
2023 Poster: PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization »
Jiancong Xiao · Ruoyu Sun · Zhi-Quan Luo -
2023 Poster: Balanced Training for Sparse GANs »
Yite Wang · Jing Wu · NAIRA HOVAKIMYAN · Ruoyu Sun -
2023 Poster: Imitation Learning from Imperfection: Theoretical Justifications and Algorithms »
Ziniu Li · Tian Xu · Zeyu Qin · Yang Yu · Zhi-Quan Luo -
2023 Poster: Understanding Expertise through Demonstrations: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning »
Siliang Zeng · Chenliang Li · Alfredo Garcia · Mingyi Hong -
2023 Poster: VCC: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens »
Zhanpeng Zeng · Cole Hawkins · Mingyi Hong · Aston Zhang · Nikolaos Pappas · Vikas Singh · Shuai Zheng -
2023 Poster: SLM: A Smoothed First-order Lagrangian Method for Structured Constrained Nonconvex Minimization »
Songtao Lu · Jiawei Zhang -
2023 Poster: Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning »
Yihua Zhang · Yimeng Zhang · Aochuan Chen · jinghan jia · Jiancheng Liu · Gaowen Liu · Mingyi Hong · Shiyu Chang · Sijia Liu -
2023 Poster: A Unified Framework for Inference-Stage Backdoor Defenses »
Xun Xian · Ganghua Wang · Jayanth Srinivasa · Ashish Kundu · Xuan Bi · Mingyi Hong · Jie Ding -
2023 Oral: Understanding Expertise through Demonstrations: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning »
Siliang Zeng · Chenliang Li · Alfredo Garcia · Mingyi Hong -
2022 Spotlight: Stability Analysis and Generalization Bounds of Adversarial Training »
Jiancong Xiao · Yanbo Fan · Ruoyu Sun · Jue Wang · Zhi-Quan Luo -
2022 Spotlight: Adam Can Converge Without Any Modification On Update Rules »
Yushun Zhang · Congliang Chen · Naichen Shi · Ruoyu Sun · Zhi-Quan Luo -
2022 Spotlight: Lightning Talks 6B-1 »
Yushun Zhang · Duc Nguyen · Jiancong Xiao · Wei Jiang · Yaohua Wang · Yilun Xu · Zhen LI · Anderson Ye Zhang · Ziming Liu · Fangyi Zhang · Gilles Stoltz · Congliang Chen · Gang Li · Yanbo Fan · Ruoyu Sun · Naichen Shi · Yibo Wang · Ming Lin · Max Tegmark · Lijun Zhang · Jue Wang · Ruoyu Sun · Tommi Jaakkola · Senzhang Wang · Zhi-Quan Luo · Xiuyu Sun · Zhi-Quan Luo · Tianbao Yang · Rong Jin -
2022 Spotlight: Lightning Talks 4A-3 »
Zhihan Gao · Yabin Wang · Xingyu Qu · Luziwei Leng · Mingqing Xiao · Bohan Wang · Yu Shen · Zhiwu Huang · Xingjian Shi · Qi Meng · Yupeng Lu · Diyang Li · Qingyan Meng · Kaiwei Che · Yang Li · Hao Wang · Huishuai Zhang · Zongpeng Zhang · Kaixuan Zhang · Xiaopeng Hong · Xiaohan Zhao · Di He · Jianguo Zhang · Yaofeng Tu · Bin Gu · Yi Zhu · Ruoyu Sun · Yuyang (Bernie) Wang · Zhouchen Lin · Qinghu Meng · Wei Chen · Wentao Zhang · Bin CUI · Jie Cheng · Zhi-Ming Ma · Mu Li · Qinghai Guo · Dit-Yan Yeung · Tie-Yan Liu · Jianxing Liao -
2022 Spotlight: Does Momentum Change the Implicit Regularization on Separable Data? »
Bohan Wang · Qi Meng · Huishuai Zhang · Ruoyu Sun · Wei Chen · Zhi-Ming Ma · Tie-Yan Liu -
2022 Poster: What is a Good Metric to Study Generalization of Minimax Learners? »
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang -
2022 Poster: A Stochastic Linearized Augmented Lagrangian Method for Decentralized Bilevel Optimization »
Songtao Lu · Siliang Zeng · Xiaodong Cui · Mark Squillante · Lior Horesh · Brian Kingsbury · Jia Liu · Mingyi Hong -
2022 Poster: Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence »
Boyi Liu · Jiayang Li · Zhuoran Yang · Hoi-To Wai · Mingyi Hong · Yu Nie · Zhaoran Wang -
2022 Poster: Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees »
Siliang Zeng · Chenliang Li · Alfredo Garcia · Mingyi Hong -
2022 Poster: Adam Can Converge Without Any Modification On Update Rules »
Yushun Zhang · Congliang Chen · Naichen Shi · Ruoyu Sun · Zhi-Quan Luo -
2022 Poster: Does Momentum Change the Implicit Regularization on Separable Data? »
Bohan Wang · Qi Meng · Huishuai Zhang · Ruoyu Sun · Wei Chen · Zhi-Ming Ma · Tie-Yan Liu -
2022 Poster: Stability Analysis and Generalization Bounds of Adversarial Training »
Jiancong Xiao · Yanbo Fan · Ruoyu Sun · Jue Wang · Zhi-Quan Luo -
2022 Poster: DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data »
Tiantian Fang · Ruoyu Sun · Alex Schwing -
2022 Poster: Advancing Model Pruning via Bi-level Optimization »
Yihua Zhang · Yuguang Yao · Parikshit Ram · Pu Zhao · Tianlong Chen · Mingyi Hong · Yanzhi Wang · Sijia Liu -
2022 Poster: Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity »
Bingqing Song · Ioannis Tsaknakis · Chung-Yiu Yau · Hoi-To Wai · Mingyi Hong -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : Contributed Talk 2: A Unified Framework to Understand Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective »
xinwei zhang · Mingyi Hong · Nicola Elia -
2021 Poster: STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning »
Prashant Khanduri · PRANAY SHARMA · Haibo Yang · Mingyi Hong · Jia Liu · Ketan Rajawat · Pramod Varshney -
2021 Poster: A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum »
Prashant Khanduri · Siliang Zeng · Mingyi Hong · Hoi-To Wai · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data »
Dachao Lin · Ruoyu Sun · Zhihua Zhang -
2020 Poster: Towards a Better Global Loss Landscape of GANs »
Ruoyu Sun · Tiantian Fang · Alex Schwing -
2020 Oral: Towards a Better Global Loss Landscape of GANs »
Ruoyu Sun · Tiantian Fang · Alex Schwing -
2020 Poster: A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems »
Jiawei Zhang · Peijun Xiao · Ruoyu Sun · Zhiquan Luo -
2020 Poster: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems »
Songtao Lu · Meisam Razaviyayn · Bo Yang · Kejun Huang · Mingyi Hong -
2020 Poster: Understanding Gradient Clipping in Private SGD: A Geometric Perspective »
Xiangyi Chen · Steven Wu · Mingyi Hong -
2020 Poster: Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms »
Xiangyi Chen · Tiancong Chen · Haoran Sun · Steven Wu · Mingyi Hong -
2020 Spotlight: Understanding Gradient Clipping in Private SGD: A Geometric Perspective »
Xiangyi Chen · Steven Wu · Mingyi Hong -
2020 Spotlight: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems »
Songtao Lu · Meisam Razaviyayn · Bo Yang · Kejun Huang · Mingyi Hong -
2020 Poster: Provably Efficient Neural GTD for Off-Policy Learning »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2019 : Lunch break and poster »
Felix Sattler · Khaoula El Mekkaoui · Neta Shoham · Cheng Hong · Florian Hartmann · Boyue Li · Daliang Li · Sebastian Caldas Rivera · Jianyu Wang · Kartikeya Bhardwaj · Tribhuvanesh Orekondy · YAN KANG · Dashan Gao · Mingshu Cong · Xin Yao · Songtao Lu · JIAHUAN LUO · Shicong Cen · Peter Kairouz · Yihan Jiang · Tzu Ming Hsu · Aleksei Triastcyn · Yang Liu · Ahmed Khaled Ragab Bayoumi · Zhicong Liang · Boi Faltings · Seungwhan Moon · Suyi Li · Tao Fan · Tianchi Huang · Chunyan Miao · Hang Qi · Matthew Brown · Lucas Glass · Junpu Wang · Wei Chen · Radu Marculescu · tomer avidor · Xueyang Wu · Mingyi Hong · Ce Ju · John Rush · Ruixiao Zhang · Youchi ZHOU · Françoise Beaufays · Yingxuan Zhu · Lei Xia -
2019 Poster: Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost »
Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang -
2019 Poster: Variance Reduced Policy Evaluation with Smooth Function Approximation »
Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang -
2019 Poster: ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization »
Xiangyi Chen · Sijia Liu · Kaidi Xu · Xingguo Li · Xue Lin · Mingyi Hong · David Cox -
2018 Poster: Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2018 Poster: Adding One Neuron Can Eliminate All Bad Local Minima »
SHIYU LIANG · Ruoyu Sun · Jason Lee · R. Srikant -
2014 Poster: Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization »
Meisam Razaviyayn · Mingyi Hong · Zhi-Quan Luo · Jong-Shi Pang -
2014 Poster: Parallel Direction Method of Multipliers »
Huahua Wang · Arindam Banerjee · Zhi-Quan Luo -
2013 Poster: On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization »
Ke Hou · Zirui Zhou · Anthony Man-Cho So · Zhi-Quan Luo