Timezone: »
Title: Simple Fixes for Adaptive Gradient Methods for Nonconvex Min-Max Optimization
Abstract: Adaptive gradient methods such as AdaGrad and Adam have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner and are successful in nonconvex minimization. When it comes to nonconvex minimax optimization, direct extensions of such adaptive optimizers without proper time-scale separation may fail to work in practice. In fact, even for a quadratic example, the naive combination of Gradient Descent Ascent with any existing adaptive stepsizes is proven to diverge if the initial primal-dual stepsize ratio is not carefully chosen. We introduce two simple fixes for these adaptive methods, allowing automatic adaptation to the time-scale separation necessary for fast convergence. The resulting algorithms are fully parameter-agnostic and achieve near-optimal complexities in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems, without a priori knowledge about problem-specific parameters. This is based on joint work with Junchi Yang and Xiang Li.
Author Information
Niao He (ETH Zurich)
More from the Same Authors
-
2022 : TiAda: A Time-scale Adaptive Algorithm For Nonconvex Minimax Optimization »
Xiang Li · Junchi YANG · Niao He -
2022 : TiAda: A Time-scale Adaptive Algorithm For Nonconvex Minimax Optimization »
Xiang Li · Junchi YANG · Niao He -
2022 : Uniform Convergence and Generalization for Nonconvex Stochastic Minimax Problems »
Siqi Zhang · Yifan Hu · Liang Zhang · Niao He -
2022 Poster: Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization »
Junchi YANG · Xiang Li · Niao He -
2022 Poster: Sharp Analysis of Stochastic Optimization under Global Kurdyka-Lojasiewicz Inequality »
Ilyas Fatkhullin · Jalal Etesami · Niao He · Negar Kiyavash -
2022 Poster: Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization »
Liang Zhang · Kiran Thekumparampil · Sewoong Oh · Niao He -
2022 Poster: Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions »
Saeed Masiha · Saber Salehkaleybar · Niao He · Negar Kiyavash · Patrick Thiran