Timezone: »
Bayesian optimization has shown success in global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to find good solutions with fewer objective function evaluations. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), which is one-step Bayes-optimal, asymptotically consistent, and provides greater one-step value of information than in the derivative-free setting. dKG accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the dKG acquisition function and its gradient using a novel fast discretization-free technique. We show dKG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.
Author Information
Jian Wu (AQR Capital Management)
Matthias Poloczek (Cornell University)
Andrew Wilson (Cornell University)
Peter Frazier (Cornell / Uber)
Peter Frazier is an Associate Professor in the School of Operations Research and Information Engineering at Cornell University, and a Staff Data Scientist at Uber. He received a Ph.D. in Operations Research and Financial Engineering from Princeton University in 2009. His research is at the intersection of machine learning and operations research, focusing on Bayesian optimization, multi-armed bandits, active learning, and Bayesian nonparametric statistics. He is an associate editor for Operations Research, ACM TOMACS, and IISE Transactions, and is the recipient of an AFOSR Young Investigator Award and an NSF CAREER Award.
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Oral: Bayesian Optimization with Gradients »
Tue. Dec 5th 07:10 -- 07:25 PM Room Hall C
More from the Same Authors
-
2021 Poster: Constrained Two-step Look-Ahead Bayesian Optimization »
Yunxiang Zhang · Xiangyu Zhang · Peter Frazier -
2021 Poster: Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs »
Raul Astudillo · Daniel Jiang · Maximilian Balandat · Eytan Bakshy · Peter Frazier -
2021 Poster: Bayesian Optimization of Function Networks »
Raul Astudillo · Peter Frazier -
2020 Poster: Bayesian Optimization of Risk Measures »
Sait Cakmak · Raul Astudillo · Peter Frazier · Enlu Zhou -
2019 Workshop: Learning with Rich Experience: Integration of Learning Paradigms »
Zhiting Hu · Andrew Wilson · Chelsea Finn · Lisa Lee · Taylor Berg-Kirkpatrick · Ruslan Salakhutdinov · Eric Xing -
2019 Poster: Practical Two-Step Lookahead Bayesian Optimization »
Jian Wu · Peter Frazier -
2018 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2017 : Invited talk: Knowledge Gradient Methods for Bayesian Optimization »
Peter Frazier -
2017 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Andrew Wilson · Diederik Kingma · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2017 Poster: Bayesian GAN »
Yunus Saatci · Andrew Wilson -
2017 Poster: Multi-Information Source Optimization »
Matthias Poloczek · Jialei Wang · Peter Frazier -
2017 Spotlight: Multi-Information Source Optimization »
Matthias Poloczek · Jialei Wang · Peter Frazier -
2017 Spotlight: Bayesian GANs »
Yunus Saatci · Andrew Wilson -
2017 Poster: Scalable Log Determinants for Gaussian Process Kernel Learning »
Kun Dong · David Eriksson · Hannes Nickisch · David Bindel · Andrew Wilson -
2017 Poster: Scalable Levy Process Priors for Spectral Kernel Learning »
Phillip Jang · Andrew Loeb · Matthew Davidow · Andrew Wilson -
2016 Poster: The Parallel Knowledge Gradient Method for Batch Bayesian Optimization »
Jian Wu · Peter Frazier