Timezone: »
In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU'(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU'(0)’s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.
Author Information
David Bertoin (IRT Saint Exupery; Institut Supérieur de l'Aéronautique et de l'Espace)
Jérôme Bolte (Université Toulouse Capitole and TSE)
Sébastien Gerchinovitz (IRT Saint Exupéry)
Edouard Pauwels (IRIT)
More from the Same Authors
-
2022 Poster: Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning »
David Bertoin · Adil Zouitine · Mehdi Zouitine · Emmanuel Rachelson -
2022 Poster: Automatic differentiation of nonsmooth iterative algorithms »
Jerome Bolte · Edouard Pauwels · Samuel Vaiter -
2022 Poster: A general approximation lower bound in $L^p$ norm, with applications to feed-forward neural networks »
El Mehdi Achour · Armand Foucault · Sébastien Gerchinovitz · François Malgouyres -
2021 Poster: Semialgebraic Representation of Monotone Deep Equilibrium Models and Applications to Certification »
Tong Chen · Jean Lasserre · Victor Magron · Edouard Pauwels -
2021 Poster: Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates »
Francois Bachoc · Tom Cesari · Sébastien Gerchinovitz -
2021 Poster: Nonsmooth Implicit Differentiation for Machine-Learning and Optimization »
Jérôme Bolte · Tam Le · Edouard Pauwels · Tony Silveti-Falls -
2020 Poster: A mathematical model for automatic differentiation in machine learning »
Jérôme Bolte · Edouard Pauwels -
2020 Spotlight: A mathematical model for automatic differentiation in machine learning »
Jérôme Bolte · Edouard Pauwels -
2016 Poster: Refined Lower Bounds for Adversarial Bandits »
Sébastien Gerchinovitz · Tor Lattimore -
2016 Poster: Sorting out typicality with the inverse moment matrix SOS polynomial »
Edouard Pauwels · Jean Lasserre