Title: Simple Fixes for Adaptive Gradient Methods for Nonconvex Min-Max Optimization
Abstract: Adaptive gradient methods such as AdaGrad and Adam have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner and are successful in nonconvex minimization. When it comes to nonconvex minimax optimization, direct extensions of such adaptive optimizers without proper time-scale separation may fail to work in practice. In fact, even for a quadratic example, the naive combination of Gradient Descent Ascent with any existing adaptive stepsizes is proven to diverge if the initial primal-dual stepsize ratio is not carefully chosen. We introduce two simple fixes for these adaptive methods, allowing automatic adaptation to the time-scale separation necessary for fast convergence. The resulting algorithms are fully parameter-agnostic and achieve near-optimal complexities in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems, without a priori knowledge about problem-specific parameters. This is based on joint work with Junchi Yang and Xiang Li.