Timezone: »
While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are still poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning rates, also providing practical insights. A novel key result is that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Further, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.
Author Information
Nicole Muecke (University of Stuttgart)
Gergely Neu (Universitat Pompeu Fabra)
Lorenzo Rosasco (University of Genova- MIT - IIT)
More from the Same Authors
-
2022 : Scalable Causal Discovery with Score Matching »
Francesco Montagna · Nicoletta Noceti · Lorenzo Rosasco · Kun Zhang · Francesco Locatello -
2022 Poster: Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits »
Gergely Neu · Iuliia Olkhovskaia · Matteo Papini · Ludovic Schwartz -
2022 Poster: Proximal Point Imitation Learning »
Luca Viano · Angeliki Kamoutsi · Gergely Neu · Igor Krawczuk · Volkan Cevher -
2022 Poster: Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces »
Vladimir Kostic · Pietro Novelli · Andreas Maurer · Carlo Ciliberto · Lorenzo Rosasco · Massimiliano Pontil -
2021 Poster: Online learning in MDPs with linear function approximation and bandit feedback. »
Gergely Neu · Iuliia Olkhovskaia -
2020 Poster: Kernel Methods Through the Roof: Handling Billions of Points Efficiently »
Giacomo Meanti · Luigi Carratino · Lorenzo Rosasco · Alessandro Rudi -
2020 Oral: Kernel Methods Through the Roof: Handling Billions of Points Efficiently »
Giacomo Meanti · Luigi Carratino · Lorenzo Rosasco · Alessandro Rudi -
2020 Poster: A Unifying View of Optimism in Episodic Reinforcement Learning »
Gergely Neu · Ciara Pike-Burke -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Implicit Regularization of Accelerated Methods in Hilbert Spaces »
Nicolò Pagliana · Lorenzo Rosasco -
2019 Poster: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates »
Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu -
2018 Poster: On Fast Leverage Score Sampling and Optimal Learning »
Alessandro Rudi · Daniele Calandriello · Luigi Carratino · Lorenzo Rosasco -
2018 Poster: Statistical and Computational Trade-Offs in Kernel K-Means »
Daniele Calandriello · Lorenzo Rosasco -
2018 Poster: Learning with SGD and Random Features »
Luigi Carratino · Alessandro Rudi · Lorenzo Rosasco -
2018 Spotlight: Statistical and Computational Trade-Offs in Kernel K-Means »
Daniele Calandriello · Lorenzo Rosasco -
2018 Spotlight: Learning with SGD and Random Features »
Luigi Carratino · Alessandro Rudi · Lorenzo Rosasco -
2018 Poster: Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification »
Dimitrios Milios · Raffaello Camoriano · Pietro Michiardi · Lorenzo Rosasco · Maurizio Filippone -
2018 Poster: Manifold Structured Prediction »
Alessandro Rudi · Carlo Ciliberto · Gian Maria Marconi · Lorenzo Rosasco -
2017 Poster: Boltzmann Exploration Done Right »
Nicolò Cesa-Bianchi · Claudio Gentile · Gergely Neu · Gabor Lugosi -
2017 Poster: Generalization Properties of Learning with Random Features »
Alessandro Rudi · Lorenzo Rosasco -
2017 Oral: Generalization Properties of Learning with Random Features »
Alessandro Rudi · Lorenzo Rosasco -
2017 Poster: Consistent Multitask Learning with Nonlinear Output Relations »
Carlo Ciliberto · Alessandro Rudi · Lorenzo Rosasco · Massimiliano Pontil -
2017 Poster: FALKON: An Optimal Large Scale Kernel Method »
Alessandro Rudi · Luigi Carratino · Lorenzo Rosasco -
2016 Poster: A Consistent Regularization Approach for Structured Prediction »
Carlo Ciliberto · Lorenzo Rosasco · Alessandro Rudi -
2016 Poster: Optimal Learning for Multi-pass Stochastic Gradient Methods »
Junhong Lin · Lorenzo Rosasco -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Adaptive Regret Bounds for Non-Stochastic Bandits »
Gergely Neu -
2015 Poster: Learning with Incremental Iterative Regularization »
Lorenzo Rosasco · Silvia Villa -
2015 Poster: Less is More: Nyström Computational Regularization »
Alessandro Rudi · Raffaello Camoriano · Lorenzo Rosasco -
2015 Oral: Less is More: Nyström Computational Regularization »
Alessandro Rudi · Raffaello Camoriano · Lorenzo Rosasco -
2015 Poster: Explore no more: Improved high-probability regret bounds for non-stochastic bandits »
Gergely Neu -
2014 Poster: Exploiting easy data in online optimization »
Amir Sani · Gergely Neu · Alessandro Lazaric -
2014 Poster: Efficient learning by implicit exploration in bandit problems with side observations »
Tomáš Kocák · Gergely Neu · Michal Valko · Remi Munos -
2014 Spotlight: Exploiting easy data in online optimization »
Amir Sani · Gergely Neu · Alessandro Lazaric -
2014 Poster: Online combinatorial optimization with stochastic decision sets and adversarial losses »
Gergely Neu · Michal Valko -
2013 Workshop: Modern Nonparametric Methods in Machine Learning »
Arthur Gretton · Mladen Kolar · Samory Kpotufe · John Lafferty · Han Liu · Bernhard Schölkopf · Alexander Smola · Rob Nowak · Mikhail Belkin · Lorenzo Rosasco · peter bickel · Yue Zhao -
2013 Poster: Online learning in episodic Markovian decision processes by relative entropy policy search »
Alexander Zimin · Gergely Neu -
2013 Poster: On the Sample Complexity of Subspace Learning »
Alessandro Rudi · Guillermo D Canas · Lorenzo Rosasco -
2012 Poster: Learning Manifolds with K-Means and K-Flats »
Guillermo D Canas · Tomaso Poggio · Lorenzo Rosasco -
2012 Poster: Multiclass Learning with Simplex Coding »
Youssef Mroueh · Tomaso Poggio · Lorenzo Rosasco · Jean-Jacques Slotine -
2012 Poster: Learning Probability Measures with respect to Optimal Transport Metrics »
Guillermo D Canas · Lorenzo Rosasco -
2010 Spotlight: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · András Antos · Csaba Szepesvari -
2010 Poster: A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups »
Sofia Mosci · Silvia Villa · Alessandro Verri · Lorenzo Rosasco -
2010 Poster: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · Csaba Szepesvari · András Antos -
2010 Poster: Spectral Regularization for Support Estimation »
Ernesto De Vito · Lorenzo Rosasco · Alessandro Toigo -
2009 Workshop: Kernels for Multiple Outputs and Multi-task Learning: Frequentist and Bayesian Points of View »
Mauricio A Alvarez · Lorenzo Rosasco · Neil D Lawrence -
2009 Poster: On Invariance in Hierarchical Models »
Jake Bouvrie · Lorenzo Rosasco · Tomaso Poggio