Timezone: »
Peter Grünwald - A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
Peter Grünwald
Sat Dec 09 09:30 AM -- 10:15 AM (PST) @
Event URL: https://homepages.cwi.nl/~pdg/ »
Over the last 15 years, machine learning theorists have bounded the performance of empirical risk minimization by (localized) Rademacher complexity; Bayesians with frequentist sympathies have studied Bayesian consistency and rate of convergence theorems, sometimes under misspecification, and PAC-Bayesians have studied convergence properties of generalized Bayesian and Gibbs posteriors. We show that, amazingly, most such bounds readily follow from essentially a single result that bounds excess risk in terms of a novel complexity COMP$(\eta,w)$. which depends on a learning rate $\eta$ and a luckiness function $w$, the latter generalizing the concept of a 'prior'. Depending on the choice of $w$, COMP$(\eta,w)$ specializes to PAC-Bayesian (KL(posterior||prior) complexity, MDL (normalized maximum likelihood) complexity and Rademacher complexity, and the bounds obtained are optimized for generalized Bayes, ERM, penalized ERM (such as Lasso) or other methods. Tuning $\eta$ leads to optimal excess risk convergence rates, even for very large (polynomial entropy) classes which have always been problematic for the PAC-Bayesian approach; the optimal $\eta$ depends on 'fast rate' properties of the domain, such as central, Bernstein and Tsybakov conditions. Joint work with Nishant Mehta, University of Victoria. See https://arxiv.org/abs/1710.07732
Over the last 15 years, machine learning theorists have bounded the performance of empirical risk minimization by (localized) Rademacher complexity; Bayesians with frequentist sympathies have studied Bayesian consistency and rate of convergence theorems, sometimes under misspecification, and PAC-Bayesians have studied convergence properties of generalized Bayesian and Gibbs posteriors. We show that, amazingly, most such bounds readily follow from essentially a single result that bounds excess risk in terms of a novel complexity COMP$(\eta,w)$. which depends on a learning rate $\eta$ and a luckiness function $w$, the latter generalizing the concept of a 'prior'. Depending on the choice of $w$, COMP$(\eta,w)$ specializes to PAC-Bayesian (KL(posterior||prior) complexity, MDL (normalized maximum likelihood) complexity and Rademacher complexity, and the bounds obtained are optimized for generalized Bayes, ERM, penalized ERM (such as Lasso) or other methods. Tuning $\eta$ leads to optimal excess risk convergence rates, even for very large (polynomial entropy) classes which have always been problematic for the PAC-Bayesian approach; the optimal $\eta$ depends on 'fast rate' properties of the domain, such as central, Bernstein and Tsybakov conditions. Joint work with Nishant Mehta, University of Victoria. See https://arxiv.org/abs/1710.07732
Author Information
Peter Grünwald (CWI and Leiden University)
More from the Same Authors
-
2019 Poster: PAC-Bayes Un-Expected Bernstein Inequality »
Zakaria Mhammedi · Peter Grünwald · Benjamin Guedj -
2016 : Safe Probability »
Peter Grünwald -
2016 : (Ir-)rationality of human decision making »
Peter Grünwald -
2016 Poster: Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning »
Wouter Koolen · Peter Grünwald · Tim van Erven -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Easy Data »
Peter Grünwald -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Poster: Learning the Learning Rate for Prediction with Expert Advice »
Wouter M Koolen · Tim van Erven · Peter Grünwald -
2013 Workshop: Learning Faster From Easy Data »
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck -
2012 Poster: Mixability in Statistical Learning »
Tim van Erven · Peter Grünwald · Mark Reid · Robert Williamson -
2011 Poster: Adaptive Hedge »
Tim van Erven · Peter Grünwald · Wouter M Koolen · Steven D Rooij -
2007 Spotlight: Catching Up Faster in Bayesian Model Selection and Model Averaging »
Tim van Erven · Peter Grünwald · Steven de Rooij -
2007 Poster: Catching Up Faster in Bayesian Model Selection and Model Averaging »
Tim van Erven · Peter Grünwald · Steven de Rooij