NIPS 2016
Skip to yearly menu bar Skip to main content


Optimizing the Optimizers

Maren Mahsereci · Alex Davies · Philipp Hennig

Area 2

Optimization problems in machine learning have aspects that make them more challenging than the traditional settings, like stochasticity, and parameters with side-effects (e.g., the batch size and structure). The field has invented many different approaches to deal with these demands. Unfortunately - and intriguingly - this extra functionality seems to invariably necessitate the introduction of tuning parameters: step sizes, decay rates, cycle lengths, batch sampling distributions, and so on. Such parameters are not present, or at least not as prominent, in classic optimization methods. But getting them right is frequently crucial, and necessitates inconvenient human “babysitting”.

Recent work has increasingly tried to eliminate such fiddle factors, typically by statistical estimation. This also includes automatic selection of external parameters like the batch-size or -structure, which have not traditionally been treated as part of the optimization task. Several different strategies have now been proposed, but they are not always compatible with each other, and lack a common framework that would foster both conceptual and algorithmic interoperability. This workshop aims to provide a forum for the nascent community studying automating parameter-tuning in optimization routines.

Among the questions to be addressed by the workshop are:

* Is the prominence of tuning parameters a fundamental feature of stochastic optimization problems? Why do classic optimization methods manage to do well with virtually no free parameters?
* In which precise sense can the "optimization of optimization algorithms" be phrased as an inference / learning problem?
* Should, and can, parameters be inferred at design-time (by a human), at compile-time (by an external compiler with access to a meta-description of the problem) or run-time (by the algorithm itself)?
* What are generic ways to learn parameters of algorithms, and inherent difficulties for doing so? Is the goal to specialize to a particular problem, or to generalize over many problems?

In addition to the invited and already confirmed speakers, we will also invite contributed work from the community. Topics of interest include, but are not strictly limited to,

* Parameter adaptation for optimization algorithms
* Stochastic optimization methods
* Optimization methods adapted for specific applications
* Batch selection methods
* Convergence diagnostics for optimization algorithms

Live content is unavailable. Log in and register to view live content

Timezone: America/Los_Angeles


Log in and register to view live content