NeurIPS Poster Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Poster

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Qihao Zhou · Haishan Ye · Luo Luo

West Ballroom A-D #6110

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: This paper considers the distributed convex-concave minimax optimization under the second-order similarity.We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction.We prove SVOGS can achieve the

ε

$\varepsilon$ -duality gap within communication rounds of

O (δ D^{2} / ε)

${\mathcal O}(\delta D^2/\varepsilon)$ , communication complexity of

O (n + \sqrt{n} δ D^{2} / ε)

${\mathcal O}(n+\sqrt{n}\delta D^2/\varepsilon)$ ,and local gradient calls of

\tilde{O} (n + (\sqrt{n} δ + L) D^{2} / ε \log (1 / ε))

$\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)D^2/\varepsilon\log(1/\varepsilon))$ , where

n

$n$ is the number of nodes,

δ

$\delta$ is the degree of the second-order similarity,

L

$L$ is the smoothness parameter and

D

$D$ is the diameter of the constraint set.We can verify that all of above complexity (nearly) matches the corresponding lower bounds.For the specific

μ

$\mu$ -strongly-convex-

μ

$\mu$ -strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of

O (δ / μ \log (1 / ε))

$\mathcal O(\delta/\mu\log(1/\varepsilon))$ ,

O ((n + \sqrt{n} δ / μ) \log (1 / ε))

${\mathcal O}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$ , and

\tilde{O} (n + (\sqrt{n} δ + L) / μ) \log (1 / ε))

$\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)/\mu)\log(1/\varepsilon))$ respectively, which are also nearly tight.Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

Chat is not available.