Timezone: »
We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the outer loop. The method can be interpreted as an extension of the Langevin algorithm to naturally handle nonlinear functional on the probability space. An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain. By adapting finite-dimensional convex optimization theory into the space of measures, we not only establish global convergence of PDA for two-layer mean field neural networks under more general settings and simpler analysis, but also provide quantitative polynomial runtime guarantee. Our theoretical results are supported by numerical simulations on neural networks with reasonable size.
Author Information
Atsushi Nitanda (Kyushu Institute of Technology / RIKEN)
Denny Wu (University of Toronto & Vector Institute)
Taiji Suzuki (The University of Tokyo/RIKEN-AIP)
More from the Same Authors
-
2021 Spotlight: Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space »
Taiji Suzuki · Atsushi Nitanda -
2022 Poster: Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning »
Tomoya Murata · Taiji Suzuki -
2022 : Reducing Communication in Nonconvex Federated Learning with a Novel Single-Loop Variance Reduction Method »
Kazusato Oko · Shunta Akiyama · Tomoya Murata · Taiji Suzuki -
2023 Poster: Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond »
Taiji Suzuki · Denny Wu · Kazusato Oko · Atsushi Nitanda -
2023 Poster: Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 Poster: Gradient-Based Feature Learning under Structured Data »
Alireza Mousavi-Hosseini · Denny Wu · Taiji Suzuki · Murat Erdogdu -
2023 Poster: Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction »
Taiji Suzuki · Denny Wu · Atsushi Nitanda -
2022 Spotlight: Lightning Talks 4A-2 »
Barakeel Fanseu Kamhoua · Hualin Zhang · Taiki Miyagawa · Tomoya Murata · Xin Lyu · Yan Dai · Elena Grigorescu · Zhipeng Tu · Lijun Zhang · Taiji Suzuki · Wei Jiang · Haipeng Luo · Lin Zhang · Xi Wang · Young-San Lin · Huan Xiong · Liyu Chen · Bin Gu · Jinfeng Yi · Yongqiang Chen · Sandeep Silwal · Yiguang Hong · Maoyuan Song · Lei Wang · Tianbao Yang · Han Yang · MA Kaili · Samson Zhou · Deming Yuan · Bo Han · Guodong Shi · Bo Li · James Cheng -
2022 Spotlight: Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning »
Tomoya Murata · Taiji Suzuki -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime »
Naoki Nishikawa · Taiji Suzuki · Atsushi Nitanda · Denny Wu -
2022 Poster: Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization »
Yuri Kinoshita · Taiji Suzuki -
2021 Poster: Differentiable Multiple Shooting Layers »
Stefano Massaroli · Michael Poli · Sho Sonoda · Taiji Suzuki · Jinkyoo Park · Atsushi Yamashita · Hajime Asama -
2021 Poster: Generalization Bounds for Graph Embedding Using Negative Sampling: Linear vs Hyperbolic »
Atsushi Suzuki · Atsushi Nitanda · jing wang · Linchuan Xu · Kenji Yamanishi · Marc Cavazza -
2021 Poster: Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space »
Taiji Suzuki · Atsushi Nitanda -
2020 : Poster Session 3 (gather.town) »
Denny Wu · Chengrun Yang · Tolga Ergen · sanae lotfi · Charles Guille-Escuret · Boris Ginsburg · Hanbake Lyu · Cong Xie · David Newton · Debraj Basu · Yewen Wang · James Lucas · MAOJIA LI · Lijun Ding · Jose Javier Gonzalez Ortiz · Reyhane Askari Hemmat · Zhiqi Bu · Neal Lawton · Kiran Thekumparampil · Jiaming Liang · Lindon Roberts · Jingyi Zhu · Dongruo Zhou -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Contributed Video: When Does Preconditioning Help or Hurt Generalization?, Denny Wu »
Denny Wu -
2020 Poster: On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression »
Denny Wu · Ji Xu -
2019 Poster: Data Cleansing for Models Trained with SGD »
Satoshi Hara · Atsushi Nitanda · Takanori Maehara -
2019 Poster: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond »
Xuechen (Chen) Li · Denny Wu · Lester Mackey · Murat Erdogdu -
2019 Spotlight: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond »
Xuechen (Chen) Li · Denny Wu · Lester Mackey · Murat Erdogdu -
2018 : Poster Session I »
Aniruddh Raghu · Daniel Jarrett · Kathleen Lewis · Elias Chaibub Neto · Nicholas Mastronarde · Shazia Akbar · Chun-Hung Chao · Henghui Zhu · Seth Stafford · Luna Zhang · Jen-Tang Lu · Changhee Lee · Adityanarayanan Radhakrishnan · Fabian Falck · Liyue Shen · Daniel Neil · Yusuf Roohani · Aparna Balagopalan · Brett Marinelli · Hagai Rossman · Sven Giesselbach · Jose Javier Gonzalez Ortiz · Edward De Brouwer · Byung-Hoon Kim · Rafid Mahmood · Tzu Ming Hsu · Antonio Ribeiro · Rumi Chunara · Agni Orfanoudaki · Kristen Severson · Mingjie Mai · Sonali Parbhoo · Albert Haque · Viraj Prabhu · Di Jin · Alena Harley · Geoffroy Dubourg-Felonneau · Xiaodan Hu · Maithra Raghu · Jonathan Warrell · Nelson Johansen · Wenyuan Li · Marko Järvenpää · Satya Narayan Shukla · Sarah Tan · Vincent Fortuin · Beau Norgeot · Yi-Te Hsu · Joel H Saltz · Veronica Tozzo · Andrew Miller · Guillaume Ausset · Azin Asgarian · Francesco Paolo Casale · Antoine Neuraz · Bhanu Pratap Singh Rawat · Turgay Ayer · Xinyu Li · Mehul Motani · Nathaniel Braman · Laetitia M Shao · Adrian Dalca · Hyunkwang Lee · Emma Pierson · Sandesh Ghimire · Yuji Kawai · Owen Lahav · Anna Goldenberg · Denny Wu · Pavitra Krishnaswamy · Colin Pawlowski · Arijit Ukil · Yuhui Zhang -
2017 Poster: Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization »
Tomoya Murata · Taiji Suzuki -
2017 Poster: Trimmed Density Ratio Estimation »
Song Liu · Akiko Takeda · Taiji Suzuki · Kenji Fukumizu -
2014 Poster: Stochastic Proximal Gradient Descent with Acceleration Techniques »
Atsushi Nitanda