Skip to yearly menu bar Skip to main content

Workshop: Mathematics of Modern Machine Learning (M3L)

MoXCo:How I learned to stop exploring and love my local minima?

Esha Singh · Shoham Sabach · Yu-Xiang Wang

Abstract: Deep Neural Networks (DNNs) are well-known for their generalization capabilities despite overparameterization. This is commonly attributed to the optimizer’s ability to find “good” solutions within high-dimensional loss landscapes. However, widely employed adaptive optimizers, such as ADAM, may suffer from subpar generalization. This paper presents an innovative methodology, $\textit{MoXCo}$, to address these concerns by designing adaptive optimizers that not only expedite exploration with faster convergence speeds but also ensure the avoidance of over-exploitation in specific parameter regimes, ultimately leading to convergence to good solutions.

Chat is not available.