Timezone: »
In recent years, the transformer has established itself as a workhorse in many applications ranging from natural language processing to reinforcement learning. Similarly, Bayesian deep learning has become the gold-standard for uncertainty estimation in safety-critical applications, where robustness and calibration are crucial.Surprisingly, no successful attempts to improve transformer models in terms of predictive uncertainty using Bayesian inference exist.In this work, we study this curiously underpopulated area of Bayesian transformers.We find that weight-space inference in transformers does not work well, regardless of the approximate posterior.We also find that the prior is at least partially at fault, but that it is very hard to find well-specified weight priors for these models.We hypothesize that these problems stem from the complexity of obtaining a meaningful mapping from weight-space to function-space distributions in the transformer.Therefore, moving closer to function-space, we propose a novel method based on the implicit reparameterization of the Dirichlet distribution to apply variational inference directly to the attention weights.We find that this proposed method performs competitively with our baselines.
Author Information
Tristan Cinquin (Swiss Federal Institute of Technology)
Alexander Immer (ETH Zurich)
Max Horn (Swiss Federal Institute of Technology)
Vincent Fortuin (ETH Zürich)
More from the Same Authors
-
2021 Spotlight: Repulsive Deep Ensembles are Bayesian »
Francesco D'Angelo · Vincent Fortuin -
2021 : PCA Subspaces Are Not Always Optimal for Bayesian Learning »
Alexandre Bense · Amir Joudaki · Tim G. J. Rudner · Vincent Fortuin -
2021 : Deep Classifiers with Label Noise Modeling and Distance Awareness »
Vincent Fortuin · Mark Collier · Florian Wenzel · James Allingham · Jeremiah Liu · Dustin Tran · Balaji Lakshminarayanan · Jesse Berent · Rodolphe Jenatton · Effrosyni Kokiopoulou -
2021 Poster: Laplace Redux - Effortless Bayesian Deep Learning »
Erik Daxberger · Agustinus Kristiadi · Alexander Immer · Runa Eschenhagen · Matthias Bauer · Philipp Hennig -
2021 Poster: Repulsive Deep Ensembles are Bayesian »
Francesco D'Angelo · Vincent Fortuin -
2019 Poster: Approximate Inference Turns Deep Networks into Gaussian Processes »
Mohammad Emtiyaz Khan · Alexander Immer · Ehsan Abedi · Maciej Korzepa