We will re-examine two popular use-cases of Bayesian approaches: model selection, and robustness to distribution shifts.
The marginal likelihood (Bayesian evidence) provides a distinctive approach to resolving foundational scientific questions --- "how can we choose between models that are entirely consistent with any data?" and "how can we learn hyperparameters or correct ground truth constraints, such as intrinsic dimensionalities, or symmetries, if our training loss doesn't select for them?". There are compelling arguments that the marginal likelihood automatically encodes Occam's razor. There are also widespread practical applications, including the variational ELBO for hyperparameter learning. However, we will discuss how the marginal likelihood is answering a fundamentally different question than "will my trained model provide good generalization?". We consider the discrepancies and their significant practical implications in detail, as well as possible resolutions.
Moreover, it is often thought that Bayesian methods, representing epistemic uncertainty, ought to have more reasonable predictive distributions under covariate shift, since these points will be far from our data manifold. However, we were surprised to find that high quality approximate Bayesian inference often leads to significantly decreased generalization performance. To understand these findings, we investigate fundamentally why Bayesian model averaging can deteriorate predictive performance under distribution and covariate shifts, and provide several remedies based on this understanding.