Skip to yearly menu bar Skip to main content

Contributed talks
Workshop: OPT 2023: Optimization for Machine Learning

Contributed Talks 2: *An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization* and *Practical Principled Policy Optimization for Finite MDPs*

Guy Kornowski · Michael Lu · Aaron Sidford

[ ]
Fri 15 Dec 9 a.m. PST — 9:30 a.m. PST


Two 15 minute talks:

  1. An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization, Guy Kornowski
  • We study the complexity of producing (\delta,\epsilon)-stationary points of Lipschitz objectives which are possibly neither smooth nor convex, using only noisy function evaluations.Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of \Omega(d^{3/2}) where d is the dimension of the problem, which was conjectured to be optimal. We refute this conjecture by providing a faster algorithm that has complexity O(d\delta^{-1}\epsilon^{-3}), which is optimal (up to numerical constants) with respect to d and also optimal with respect to the accuracy parameters \delta,\epsilon, thus solving an open question due to Lin et al. (NeurIPS'22). Moreover, the convergence rate achieved by our algorithm is also optimal for smooth objectives, proving that in the nonconvex stochastic zero-order setting, nonsmooth optimization is as easy as smooth optimization. We provide algorithms that achieve the aforementioned convergence rate in expectation as well as with high probability.Our analysis is based on a simple yet powerful geometric lemma regarding the Goldstein-subdifferential set, which allows utilizing recent advancements in first-order nonsmooth nonconvex optimization.
  1. Practical Principled Policy Optimization for Finite MDPs, Michael Lu
  • We consider (stochastic) softmax policy gradient (PG) methods for finite Markov Decision Processes (MDP). While the PG objective is not concave, recent research has used smoothness and gradient dominance to achieve convergence to an optimal policy. However, these results depend on having extensive knowledge of the environment, such as the optimal action or the true mean reward vector, to configure the algorithm parameters. This makes the resulting algorithms impractical in real applications. To alleviate this problem, we propose PG methods that employ an Armijo line-search in the deterministic setting and an exponentially decreasing step-size in the stochastic setting. We demonstrate that these proposed algorithms offer similar theoretical guarantees as previous works but now do not require the knowledge of oracle-like quantities. Furthermore, we apply the similar techniques to develop practical, theoretically sound entropy-regularized methods for both deterministic and stochastic settings. Finally, we empirically compare the proposed methods with previous approaches in single-state MDP environments.

Chat is not available.