Timezone: »
One of the central features of modern machine learning models, including deep neural networks, is their generalization ability on structured data in the over-parametrized regime. In this work, we consider an analytically solvable setup to investigate how properties of data impact learning in classification problems, and compare the results obtained for quadratic loss and logistic loss. Using methods from statistical physics, we obtain a precise asymptotic expression for the train and test errors of random feature models trained on a simple model of structured data. The input covariance is built from independent blocks allowing us to tune the saliency of low-dimensional structures and their alignment with respect to the target function.Our results show in particular that in the over-parametrized regime, the impact of data structure on both train and test error curves is greater for logistic loss than for mean-squared loss: the easier the task, the wider the gap in performance between the two losses at the advantage of the logistic. Numerical experiments on MNIST and CIFAR10 confirm our insights.
Author Information
Stéphane d'Ascoli (ENS Paris / Meta AI)
Currently a joint Ph.D. student between ENS (supervised by Giulio Biroli) and FAIR (supervised by Levent Sagun). Working on theory of deep learning.
Marylou Gabrié (NYU / Flatiron Institute)
Levent Sagun (EPFL)
Giulio Biroli (Ecole Normale Superieure)
More from the Same Authors
-
2022 Poster: End-to-end Symbolic Regression with Transformers »
Pierre-alexandre Kamienny · Stéphane d'Ascoli · Guillaume Lample · Francois Charton -
2022 Poster: Local-Global MCMC kernels: the best of both worlds »
Sergey Samsonov · Evgeny Lagutin · Marylou Gabrié · Alain Durmus · Alexey Naumov · Eric Moulines -
2020 Poster: Triple descent and the two kinds of overfitting: where & why do they appear? »
Stéphane d'Ascoli · Levent Sagun · Giulio Biroli -
2020 Spotlight: Triple descent and the two kinds of overfitting: where & why do they appear? »
Stéphane d'Ascoli · Levent Sagun · Giulio Biroli -
2019 Poster: Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias »
Stéphane d'Ascoli · Levent Sagun · Giulio Biroli · Joan Bruna -
2018 Poster: Entropy and mutual information in models of deep neural networks »
Marylou Gabrié · Andre Manoel · Clément Luneau · jean barbier · Nicolas Macris · Florent Krzakala · Lenka Zdeborová -
2018 Spotlight: Entropy and mutual information in models of deep neural networks »
Marylou Gabrié · Andre Manoel · Clément Luneau · jean barbier · Nicolas Macris · Florent Krzakala · Lenka Zdeborová -
2015 Poster: Training Restricted Boltzmann Machine via the Thouless-Anderson-Palmer free energy »
Marylou Gabrie · Eric W Tramel · Florent Krzakala