Timezone: »
Our workshop focuses on optimization theory and practice that is relevant to machine learning. This proposal builds on precedent established by two of our previously wellreceived NIPS workshops:
(@NIPS08) http://opt2008.kyb.tuebingen.mpg.de/
(@NIPS09) http://opt.kyb.tuebingen.mpg.de/
Both these workshops had packed (often overpacked) attendance almost throughout the day. This enthusiastic reception reflects the strong interest, relevance, and importance enjoyed by optimization in the greater ML community.
One could ask why does optimization attract such continued interest? The answer is simple but telling: optimization lies at the heart of almost every ML algorithm. For some algorithms textbook methods suffice, but the majority require tailoring algorithmic tools from optimization, which in turn depends on a deeper understanding of the ML requirements. In fact, ML applications and researchers are driving some of the most cuttingedge developments in optimization today. The intimate relation of optimization with ML is the key motivation for our workshop, which aims to foster discussion, discovery, and dissemination of the stateoftheart in optimization, especially in the context of ML.
The workshop should realize its aims by:
* Providing a platform for increasing the interaction between researchers from optimization, operations research, statistics, scientific computing, and machine learning;
* Identifying key problems and challenges that lie at the intersection of optimization and ML;
* Narrowing the gap between optimization and ML, to help reduce rediscovery, and thereby accelerating new advances.
ADDITIONAL BACKGROUND AND MOTIVATION
Previous talks at the OPT workshops have covered frameworks for convex programs (D. Bertsekas), the intersection of ML and optimization, especially in the area of SVM training (S. Wright), largescale learning via stochastic gradient methods and its tradeoffs (L. Bottou, N. Srebro), exploitation of structured sparsity in optimization (Vandenberghe), and randomized methods for extremely largescale convex optimization (A. Nemirovski). Several important realizations were brought to the fore by these talks, and many of the dominant ideas will appear in our book (to be published by MIT Press) on Optimization for Machine learning.
Given the above background it is easy to acknowledge that optimization is indispensable to machine learning. But what more can we say beyond this obvious realization?
The ML community's interest in optimization continues to grow. Invited tutorials on optimization will be presented this year at ICML (N. Srebro) and NIPS (S. Wright). The traditional `point of contact'' between ML and optimization  SVM  continues to be a driver of research on a number of fronts. Much interest has focused recently on stochastic gradient methods, which can be used in an online setting and in settings where data sets are extremely large and high accuracy is not required. Regularized logistic regression is another area that has produced a recent flurry of activity at the intersection of the two communities. Many aspects of stochastic gradient remain to be explored, for example, different algorithmic variants, customizing to the data set structure, convergence analysis, sampling techniques, software, choice of regularization and tradeoff parameters, parallelism. There also needs to be a better understanding of the limitations of these methods, and what can be done to accelerate them or to detect when to switch to alternative strategies. In the logistic regression setting, use of approximate secondorder information has been shown to improve convergence, but many algorithmic issues remain. Detection of combined effects predictors (which lead to a huge increase in the number of variables), use of group regularizers, and dealing with the need to handle very large data sets in real time all present challenges.
<br>
<br>To avoid becoming lopsided, in our workshop we will also admit the
not particularly large scale' setting, where one has time to wield substantial computational resources. In this setting, highaccuracy solutions and deep understanding of the lessons contained in the data are needed. Examples valuable to MLers may be exploration of genetic and environmental data to identify risk factors for disease; or problems dealing with setups where the amount of observed data is not huge, but the mathematical models are complex.
Author Information
Suvrit Sra (MIT)
Suvrit Sra is a faculty member within the EECS department at MIT, where he is also a core faculty member of IDSS, LIDS, MITML Group, as well as the statistics and data science center. His research spans topics in optimization, matrix theory, differential geometry, and probability theory, which he connects with machine learning  a key focus of his research is on the theme "Optimization for Machine Learning” (http://optml.org)
Sebastian Nowozin (Microsoft Research)
Stephen Wright (UWMadison)
Steve Wright is a Professor of Computer Sciences at the University of WisconsinMadison. His research interests lie in computational optimization and its applications to science and engineering. Prior to joining UWMadison in 2001, Wright was a Senior Computer Scientist (19972001) and Computer Scientist (19901997) at Argonne National Laboratory, and Professor of Computer Science at the University of Chicago (20002001). He is the past Chair of the Mathematical Optimization Society (formerly the Mathematical Programming Society), the leading professional society in optimization, and a member of the Board of the Society for Industrial and Applied Mathematics (SIAM). Wright is the author or coauthor of four widely used books in numerical optimization, including "Primal Dual InteriorPoint Methods" (SIAM, 1997) and "Numerical Optimization" (with J. Nocedal, Second Edition, Springer, 2006). He has also authored over 85 refereed journal papers on optimization theory, algorithms, software, and applications. He is coauthor of widely used interiorpoint software for linear and quadratic optimization. His recent research includes algorithms, applications, and theory for sparse optimization (including applications in compressed sensing and machine learning).
More from the Same Authors

2022 : BOME! Bilevel Optimization Made Easy: A Simple FirstOrder Approach »
Mao Ye · Bo Liu · Stephen Wright · Peter Stone · Qiang Liu 
2023 Poster: The Curious Role of Normalization in SharpnessAware Minimization »
Yan Dai · Kwangjun Ahn · Suvrit Sra 
2023 Poster: Transformers learn to implement preconditioned gradient descent for incontext learning »
Kwangjun Ahn · Xiang Cheng · Hadi Daneshmand · Suvrit Sra 
2023 Poster: Robust SecondOrder Nonconvex Optimization and Its Application to Low Rank Matrix Sensing »
Shuyao Li · Yu Cheng · Ilias Diakonikolas · Jelena Diakonikolas · Rong Ge · Stephen Wright 
2022 Poster: CCCP is FrankWolfe in disguise »
Alp Yurtsever · Suvrit Sra 
2022 Poster: BOME! Bilevel Optimization Made Easy: A Simple FirstOrder Approach »
Bo Liu · Mao Ye · Stephen Wright · Peter Stone · Qiang Liu 
2022 Poster: Coordinate Linear Variance Reduction for Generalized Linear Programming »
Chaobing Song · Cheuk Yin Lin · Stephen Wright · Jelena Diakonikolas 
2022 Poster: Efficient Sampling on Riemannian Manifolds via Langevin MCMC »
Xiang Cheng · Jingzhao Zhang · Suvrit Sra 
2021 Poster: Can contrastive learning avoid shortcut solutions? »
Joshua Robinson · Li Sun · Ke Yu · Kayhan Batmanghelich · Stefanie Jegelka · Suvrit Sra 
2021 Poster: Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates »
Alp Yurtsever · Alex Gu · Suvrit Sra 
2020 : Invited speaker: SGD without replacement: optimal rate analysis and more, Suvrit Sra »
Suvrit Sra 
2020 Poster: SGD with shuffling: optimal rates without component convexity and large epoch requirements »
Kwangjun Ahn · Chulhee Yun · Suvrit Sra 
2020 Spotlight: SGD with shuffling: optimal rates without component convexity and large epoch requirements »
Kwangjun Ahn · Chulhee Yun · Suvrit Sra 
2020 Poster: Why are Adaptive Methods Good for Attention Models? »
Jingzhao Zhang · Sai Praneeth Karimireddy · Andreas Veit · Seungyeon Kim · Sashank Reddi · Sanjiv Kumar · Suvrit Sra 
2020 Oral: Hogwild!: A LockFree Approach to Parallelizing Stochastic Gradient Descent »
Benjamin Recht · Christopher Ré · Stephen Wright · Feng Niu 
2020 Poster: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes »
Yi Tian · Jian Qian · Suvrit Sra 
2020 Spotlight: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes »
Yi Tian · Jian Qian · Suvrit Sra 
2019 : Secondorder methods for nonconvex optimization with complexity guarantees »
Stephen Wright 
2019 Poster: Flexible Modeling of Diversity with Strongly LogConcave Distributions »
Joshua Robinson · Suvrit Sra · Stefanie Jegelka 
2019 Poster: Are deep ResNets provably better than linear predictors? »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie 
2019 Poster: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie 
2019 Spotlight: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie 
2018 Workshop: Smooth Games Optimization and Machine Learning »
Simon LacosteJulien · Ioannis Mitliagkas · Gauthier Gidel · Vasilis Syrgkanis · Eva Tardos · Leon Bottou · Sebastian Nowozin 
2018 Poster: Direct RungeKutta Discretization Achieves Acceleration »
Jingzhao Zhang · Aryan Mokhtari · Suvrit Sra · Ali Jadbabaie 
2018 Spotlight: Direct RungeKutta Discretization Achieves Acceleration »
Jingzhao Zhang · Aryan Mokhtari · Suvrit Sra · Ali Jadbabaie 
2018 Poster: Exponentiated Strongly Rayleigh Distributions »
Zelda Mariet · Suvrit Sra · Stefanie Jegelka 
2018 Poster: ATOMO: Communicationefficient Learning via Atomic Sparsification »
Hongyi Wang · Scott Sievert · Shengchao Liu · Zachary Charles · Dimitris Papailiopoulos · Stephen Wright 
2018 Tutorial: Negative Dependence, Stable Polynomials, and All That »
Suvrit Sra · Stefanie Jegelka 
2017 Workshop: OPT 2017: Optimization for Machine Learning »
Suvrit Sra · Sashank J. Reddi · Alekh Agarwal · Benjamin Recht 
2017 Poster: The Numerics of GANs »
Lars Mescheder · Sebastian Nowozin · Andreas Geiger 
2017 Spotlight: The Numerics of GANs »
Lars Mescheder · Sebastian Nowozin · Andreas Geiger 
2017 Poster: Elementary Symmetric Polynomials for Optimal Experimental Design »
Zelda Mariet · Suvrit Sra 
2017 Poster: kSupport and Ordered Weighted Sparsity for Overlapping Groups: Hardness and Algorithms »
Cong Han Lim · Stephen Wright 
2017 Poster: Stabilizing Training of Generative Adversarial Networks through Regularization »
Kevin Roth · Aurelien Lucchi · Sebastian Nowozin · Thomas Hofmann 
2017 Poster: Polynomial time algorithms for dual volume sampling »
Chengtao Li · Stefanie Jegelka · Suvrit Sra 
2016 Workshop: OPT 2016: Optimization for Machine Learning »
Suvrit Sra · Francis Bach · Sashank J. Reddi · Niao He 
2016 : Discussion panel »
Ian Goodfellow · Soumith Chintala · Arthur Gretton · Sebastian Nowozin · Aaron Courville · Yann LeCun · Emily Denton 
2016 : Taming nonconvexity via geometry »
Suvrit Sra 
2016 : Training Generative Neural Samplers using Variational Divergence »
Sebastian Nowozin 
2016 Poster: Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling »
Chengtao Li · Suvrit Sra · Stefanie Jegelka 
2016 Poster: Kronecker Determinantal Point Processes »
Zelda Mariet · Suvrit Sra 
2016 Poster: fGAN: Training Generative Neural Samplers using Variational Divergence Minimization »
Sebastian Nowozin · Botond Cseke · Ryota Tomioka 
2016 Poster: Proximal Stochastic Methods for Nonsmooth Nonconvex FiniteSum Optimization »
Sashank J. Reddi · Suvrit Sra · Barnabas Poczos · Alexander Smola 
2016 Poster: Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds »
Hongyi Zhang · Sashank J. Reddi · Suvrit Sra 
2016 Poster: DISCO Nets : DISsimilarity COefficients Networks »
Diane Bouchacourt · Pawan K Mudigonda · Sebastian Nowozin 
2016 Tutorial: LargeScale Optimization: Beyond Stochastic Gradient Descent and Convexity »
Suvrit Sra · Francis Bach 
2015 Workshop: Optimization for Machine Learning (OPT2015) »
Suvrit Sra · Alekh Agarwal · Leon Bottou · Sashank J. Reddi 
2015 Poster: Matrix Manifold Optimization for Gaussian Mixtures »
Reshad Hosseini · Suvrit Sra 
2015 Poster: On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants »
Sashank J. Reddi · Ahmed Hefny · Suvrit Sra · Barnabas Poczos · Alexander Smola 
2014 Workshop: Discrete Optimization in Machine Learning »
Jeffrey A Bilmes · Andreas Krause · Stefanie Jegelka · S Thomas McCormick · Sebastian Nowozin · Yaron Singer · Dhruv Batra · Volkan Cevher 
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck 
2014 Poster: Beyond the Birkhoff Polytope: Convex Relaxations for Vector Permutation Problems »
Cong Han Lim · Stephen Wright 
2014 Poster: Efficient Structured Matrix Rank Minimization »
Adams Wei Yu · Wanli Ma · Yaoliang Yu · Jaime Carbonell · Suvrit Sra 
2013 Workshop: OPT2013: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal 
2013 Poster: Decision Jungles: Compact and Rich Models for Classification »
Jamie Shotton · Toby Sharp · Pushmeet Kohli · Sebastian Nowozin · John Winn · Antonio Criminisi 
2013 Poster: Geometric optimisation on positive definite matrices for elliptically contoured distributions »
Suvrit Sra · Reshad Hosseini 
2013 Poster: Reflection methods for userfriendly submodular optimization »
Stefanie Jegelka · Francis Bach · Suvrit Sra 
2013 Poster: An Approximate, Efficient LP Solver for LP Rounding »
Srikrishna Sridhar · Stephen Wright · Christopher Re · Ji Liu · Victor Bittorf · Ce Zhang 
2012 Workshop: LogLinear Models »
Dimitri Kanevsky · Tony Jebara · Li Deng · Stephen Wright · Georg Heigold · Avishy Carmi 
2012 Workshop: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal 
2012 Poster: A new metric on the manifold of kernel matrices with application to matrix geometric means »
Suvrit Sra 
2012 Poster: Scalable nonconvex inexact proximal splitting »
Suvrit Sra 
2011 Workshop: Optimization for Machine Learning »
Suvrit Sra · Stephen Wright · Sebastian Nowozin 
2011 Poster: Hogwild!: A LockFree Approach to Parallelizing Stochastic Gradient Descent »
Benjamin Recht · Christopher Re · Stephen Wright · Feng Niu 
2011 Poster: HigherOrder Correlation Clustering for Image Segmentation »
Sungwoong Kim · Sebastian Nowozin · Pushmeet Kohli · Chang D. D Yoo 
2010 Workshop: Numerical Mathematics Challenges in Machine Learning »
Matthias Seeger · Suvrit Sra 
2010 Tutorial: Optimization Algorithms in Machine Learning »
Stephen Wright 
2009 Workshop: Optimization for Machine Learning »
Sebastian Nowozin · Suvrit Sra · S.V.N Vishwanthan · Stephen Wright 
2008 Workshop: Optimization for Machine Learning »
Suvrit Sra · Sebastian Nowozin · Vishwanathan S V N