Timezone: »
To deploy deep learning in the wild responsibly, we must know when models are making unsubstantiated guesses. The field of Bayesian Deep Learning (BDL) has been a focal point in the ML community for the development of such tools. Big strides have been made in BDL in recent years, with the field making an impact outside of the ML community, in fields including astronomy, medical imaging, physical sciences, and many others. But the field of BDL itself is facing an evaluation crisis: most BDL papers evaluate uncertainty estimation quality of new methods on MNIST and CIFAR alone, ignoring needs of real world applications which use BDL. Therefore, apart from discussing latest advances in BDL methodologies, a particular focus of this year’s programme is on the reliability of BDL techniques in downstream tasks. This focus is reflected through invited talks from practitioners in other fields and by working together with the two NeurIPS challenges in BDL — the Approximate Inference in Bayesian Deep Learning Challenge and the Shifts Challenge on Robustness and Uncertainty under Real-World Distributional Shift — advertising work done in applications including autonomous driving, medical, space, and more. We hope that the mainstream BDL community will adopt real world benchmarks based on such applications, pushing the field forward beyond MNIST and CIFAR evaluations.
Tue 3:00 a.m. - 3:10 a.m.
|
Opening Remarks
(
Opening remarks (zoom)
)
link »
SlidesLive Video » |
🔗 |
Tue 3:10 a.m. - 3:30 a.m.
|
Adaptive and Robust Learning with Bayes
(
Invited talk
)
link »
SlidesLive Video » Emtiyaz Khan, Dharmesh Tailor, Siddharth Swaroop |
🔗 |
Tue 3:30 a.m. - 3:50 a.m.
|
A Bayesian Perspective on Meta-Learning
(
Invited talk
)
link »
SlidesLive Video » Yee Whye Teh |
🔗 |
Tue 3:50 a.m. - 4:10 a.m.
|
Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift
(
Competition talk
)
link »
SlidesLive Video » |
🔗 |
Tue 4:10 a.m. - 4:30 a.m.
|
Gaussian Dropout as an Information Bottleneck Layer
(
Contributed talk
)
link »
SlidesLive Video » Melanie Rey |
🔗 |
Tue 4:20 a.m. - 4:30 a.m.
|
Funnels: Exact Maximum Likelihood with Dimensionality Reduction
(
Contributed talk
)
link »
SlidesLive Video » Samuel Klein |
🔗 |
Tue 4:30 a.m. - 5:30 a.m.
|
Posters (gather town link to the right) and lunch break ( Poster ) link » | 🔗 |
Tue 5:30 a.m. - 5:50 a.m.
|
Spacecraft Collision Avoidance with Bayesian Deep Learning
(
Invited talk
)
link »
SlidesLive Video » Atılım Güneş Baydin, Francesco Pinto |
🔗 |
Tue 5:50 a.m. - 6:10 a.m.
|
Inference & Sampling with Symmetries
(
Invited talk
)
link »
SlidesLive Video » Danilo Rezende, Peter Wirnsberger |
🔗 |
Tue 6:10 a.m. - 6:30 a.m.
|
Bayesian Neural Networks, Andversarial Attacks, and How the Amount of Samples Matters
(
Invited talk
)
link »
SlidesLive Video » Asja Fischer, Sina Däubener |
🔗 |
Tue 6:30 a.m. - 8:00 a.m.
|
Posters (gather town) ( Poster ) link » | 🔗 |
Tue 8:00 a.m. - 8:20 a.m.
|
Quantified Uncertainty for Safe Operation of Particle Accelerators
(
Invited talk
)
link »
SlidesLive Video » Adi Hanuka, Owen Convery |
🔗 |
Tue 8:20 a.m. - 8:30 a.m.
|
Diversity is All You Need to Improve Bayesian Model Averaging
(
Contributed talk
)
link »
SlidesLive Video » Yashvir Grewal |
🔗 |
Tue 8:30 a.m. - 8:40 a.m.
|
Structure Stochastic Gradient MCMC: a hybrid VI and MCMC approach
(
Contributed talk
)
link »
SlidesLive Video » Alex Boyd, Antonios Alexos |
🔗 |
Tue 8:40 a.m. - 9:00 a.m.
|
Evaluating Approximate Inference in Bayesian Deep Learning
(
Competition talk
)
link »
SlidesLive Video » |
🔗 |
Tue 9:00 a.m. - 9:20 a.m.
|
An Automatic Finite-Data Robustness Metric for Bayes and Beyond: Can Dropping a Little Data Change Conclusions?
(
Invited talk
)
link »
SlidesLive Video » |
🔗 |
Tue 9:20 a.m. - 9:25 a.m.
|
Closing remarks
link »
SlidesLive Video » |
🔗 |
Tue 9:25 a.m. - 11:00 a.m.
|
Social and Posters (gather town) ( Poster ) link » | 🔗 |
-
|
Diversity is All You Need to Improve Bayesian Model Averaging
(
Poster
)
link »
Existing approximate inference techniques produce predictive distributions that are quite distinct from the predictive distribution of the gold-standard Hamiltonian Monte Carlo. In this work, we bring the predictive distribution produced by deep ensembles more closer to the Hamiltonian Monte Carlo predictive distribution by increasing the diversity within the ensembles. The proposed approach outperforms the existing approximate inference methods and is also currently ranked the highest in the Approximate Inference competition at NeurIPS 2021. |
Yashvir Singh Grewal · Thang Bui 🔗 |
-
|
Regularizations Are All You Need: Weather Prediction Under Distributional Shift
(
Poster
)
link »
In this paper, we present preliminary results on improving out-of-domain weather prediction and uncertainty estimation as part of the \texttt{Shifts Challenge on Robustness and Uncertainty under Real-World Distributional Shift} challenge. Our preliminary results show that by leveraging an ensemble of Bayesian models and thoughtful;y splitting the training set, we can achieve more robust and accurate results than standard libraries. We quantify our predictions using several metrics and propose several future lines of inquiry and experimentation to boost performance. |
Sankalp Gilda · Neel Bhandari · Wendy Wing Yee Mak · Andrea Panizza 🔗 |
-
|
Reducing redundancy in Semantic-KITTI: Study on data augmentations within Active Learning
(
Poster
)
link »
Active learning has recently gained attention in deep learning tasks dedicated to autonomous driving, such as image classification. However, semantic segmentation for point clouds remains a largely unexplored task in active learning, mainly due to the heavy computational cost of such work. In this paper, we present an analysis to reduce data redundancy in the large-scale dataset Semantic-Kitti, thanks to active learning uncertainty-based methods and data augmentation. We are able to demonstrate that data augmentation techniques is helping our active learning cycles, and achieve baseline accuracy with only 60% of the dataset. |
Alexandre Almin · Anh Duong · Léo Lemarié · Ravi Kiran 🔗 |
-
|
An Empirical Analysis of Uncertainty Estimation in Genomics Applications
(
Poster
)
link »
The usability of machine learning solutions in critical real-world applications relies on the availability of an uncertainty measure that reflects the confidence in the model predictions. In this work, we present an empirical analysis of uncertainty estimation approaches in Deep Learning models. We contrast Bayesian Neural Networks (BNN) against Monte Carlo-dropout (MC-dropout) methods to evaluate their performance and uncertainty scores in two classification tasks with different dataset characteristics. |
Sepideh Saran · Mahsa Ghanbari · Uwe Ohler 🔗 |
-
|
Hierarchical Topic Evaluation: Statistical vs. Neural Models
(
Poster
)
link »
Hierarchical topic models (HTMs)---especially those based on Bayesian deep learning---are gaining increasing attention from the ML community. However, in contrast to their flat counterparts, their proper evaluation is rarely addressed. We propose several measures to evaluate HTMs in terms of their (branch-wise and layer-wise) topic hierarchy. We apply these measures to benchmark several HTMs on a wide range of datasets. We compare neural HTMs to traditional statistical HTMs in topic quality and interpretability. Our findings may help better judge advantages and disadvantages in different deep hierarchical topic models and drive future research in this area. |
Mayank Kumar Nagda · Charu Karakkaparambil James · Sophie Burkhardt · Marius Kloft 🔗 |
-
|
Reflected Hamiltonian Monte Carlo
(
Poster
)
link »
The Hamiltonian Monte Carlo method is well-known for its ability to generate distant proposals and avoid random-walk behaviour. Its sampling efficiency however is highly sensitive to the choice of the number of leapfrog integration steps. Although the No-U-Turn Sampler automates the tuning of this parameter, it is computationally expensive and practically challenging to implement, especially on parallel architectures. In this work, we introduce the Reflected Hamiltonian Monte Carlo sampler, an HMC methodology that builds upon a reflection mechanism also used in the Bouncy Particle Sampler. The algorithm has an update rate parameter that plays an analogous role to that of the number of leapfrog integration steps in Hamiltonian Monte Carlo. With a focus on high-dimensional classification tasks, we demonstrate the competitive performance of the proposed algorithm against well-tuned Hamiltonian-based Markov Chain Monte Carlo methods. |
Khai Xiang Au · alexandre thiery 🔗 |
-
|
Federated Functional Variational Inference
(
Poster
)
link »
Traditional federated learning (FL) involves optimizing point estimates for the parameters of the server model via a maximum likelihood objective. While models trained with such objectives show competitive predictive accuracy, they are poorly calibrated and provide no reliable uncertainty estimates. Well calibrated uncertainty is, however, important in safety critical applications of FL such as self-driving cars and healthcare. In this work we propose several methods to train Bayesian neural networks, networks providing uncertainty over their model parameters, in FL. We introduce baseline methods that employ priors in and do inference on the weight-space of the network. We also propose two function-space inference methods. These build upon recent work in functional variational inference to posit prior distributions in and do inference on the function-space of the network. These two approaches are based on Federated Averaging (FedAvg) and Expectation-Maximization (EM). We compare these function-space methods to their weight-space counterparts. |
Michael Hutchinson · Matthias Reisser · Christos Louizos 🔗 |
-
|
Towards Robust Object Detection: Bayesian RetinaNet for Homoscedastic Aleatoric Uncertainty Modeling
(
Poster
)
link »
According to recent studies, commonly used computer vision datasets contain about 4% of label errors. For example, the COCO dataset is known for its high level of noise in data labels, which limits its use for training robust neural deep architectures in a real-world scenario.To model such a noise, in this paper we have proposed the homoscedastic aleatoric uncertainty estimation, and present a series of novel loss functions to address the problem of image object detection at scale.Specifically, the proposed functions are based on Bayesian inference and we have incorporated them into the common community-adopted object detection deep learning architecture RetinaNet.We have also shown that modeling of homoscedastic aleatoric uncertainty using our novel functions allows to increase the model interpretability and to improve the object detection performance being evaluated on the COCO dataset. |
Natalia Khanzhina · Alexey Lapenok · Andrey Filchenkov 🔗 |
-
|
Stochastic Pruning: Fine-Tuning, and PAC-Bayes bound optimization
(
Poster
)
link »
We introduce an algorithmic framework for stochastic fine-tuning of pruning masks, starting from masks produced by several baselines. We further show that by minimizing a PAC-Bayes bound with data-dependent priors, we obtain a self-bounded learning algorithm with numerically tight bounds. In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the |
Soufiane Hayou · Bobby He · Gintare Karolina Dziugaite 🔗 |
-
|
Adversarial Learning of a Variational Generative Model with Succinct Bottleneck Representation
(
Poster
)
link »
A new bimodal generative model is proposed for generating conditional and joint samples, accompanied with a training method with learning a succinct bottleneck representation.The proposed model, dubbed as the variational Wyner model, is designed based on two classical problems in network information theory---distributed simulation and channel synthesis---in which Wyner's common information arises as the fundamental limit on the succinctness of the common representation.The model is trained by minimizing the symmetric Kullback--Leibler divergence between variational and model distributions with regularization terms for common information, reconstruction consistency, and latent space matching terms, which is carried out via an adversarial density ratio estimation technique. |
Jongha (Jon) Ryu · Yoojin Choi · Young-Han Kim · Mostafa El-Khamy · Jungwon Lee 🔗 |
-
|
Posterior Temperature Optimization in Variational Inference for Inverse Problems
(
Poster
)
link »
Bayesian methods feature useful properties for solving inverse problems, such as tomographic reconstruction. The prior distribution introduces regularization, which helps solving the ill-posed problem and reduces overfitting. In practice, this often results in a suboptimal posterior temperature and the full potential of the Bayesian approach is not realized. In this paper, we optimize both the parameters of the prior distribution and the posterior temperature using Bayesian optimization. Well-tempered posteriors lead to better predictive performance and improved uncertainty calibration, which we demonstrate for the task of sparse-view CT reconstruction. |
Max Laves · Malte Tölle · Alexander Schlaefer · Sandy Engelhardt 🔗 |
-
|
Revisiting the Structured Variational Autoencoder
(
Poster
)
link »
The Structured Variational Autoencoder (SVAE) was introduced five years ago. It presented a modeling idea---to use probabilsitic graphical models (PGMs) as priors on latent variables and deep neural networks (DNNs) to map them to observed data---as well as an inference idea---to have the recognition network output conjugate potentials to the PGM prior rather than a full posterior. While mathematically appealing, the SVAE proved impractical to use or extend, as learning required implicit differentiation of a PGM inference algorithm, and the original authors' implementation was in pure Python with no GPU or TPU support. Now, armed with the power of JAX, a software library for automatic differentiation and compilation to CPU, GPU, or TPU targets, we revisit the SVAE. We develop a modular implementation that is orders of magnitude faster than the original code and show examples in a variety of different settings, including a scientific application to animal behavior modeling. Furthermore, we extend the original model by incorporating interior potentials, which allows for more expressive PGM priors, such as the Recurrent Switching Linear Dynamical System (rSLDS). Our JAX implementation of the SVAE and its extensions open up avenues for many practical applications, extensions, and theoretical investigations. |
Yixiu Zhao · Scott Linderman 🔗 |
-
|
Robust outlier detection by de-biasing VAE likelihoods
(
Poster
)
link »
Deep networks often make confident, yet, incorrect, predictions when tested with outlier data that is far removed from their training distributions. Likelihoods computed by deep generative models (DGM) are a candidate metric for outlier detection with unlabeled data. Yet, DGM likelihoods are readily biased and unreliable. Here, we examine outlier detection with variational autoencoders (VAEs), among the simplest of DGMs. We show that an analytically-derived correction ameliorates a key bias with VAE likelihoods. The bias correction is sample-specific, computationally inexpensive and readily computed for various visible distributions. Next, we show that a well-known preprocessing technique, contrast stretching, extends the effectiveness of bias correction to improve outlier detection performance. We evaluate our approach comprehensively with nine (grayscale and natural) image datasets, and demonstrate significant advantages, in terms of speed and accuracy, over four state-of-the-art methods. |
Kushal Chauhan · Pradeep Shenoy · Manish Gupta · Devarajan Sridharan 🔗 |
-
|
The Dynamics of Functional Diversity throughout Neural Network Training
(
Poster
)
link »
Deep ensembles offer consistent performance gains, both in terms of reduced generalization error and improved predictive uncertainty estimates. These performance gains are attributed to functional diversity among the components that make up the ensembles: ensemble performance increases with the diversity of the components. A standard way to generate a diversity of components from a single data set is to train multiple networks on the same data, but different minibatch orders (and augmentations, etc.). In this work, we study when and how this type of diversity decreases during deep neural network training. Using couplings of multiple training runs, we find that diversity rapidly decreases at the start of training, and that increased training time does not restore this lost diversity, implying that early stages of training make irreversible commitments. In particular, our findings provide further evidence that there is less diversity among functions once linear mode connectivity sets in. This motivates studying perturbations to training that upset linear mode connectivity. We then study how functional diversity is affected by retraining after reinitializing the weights in some layers. We find that we recover significantly more diversity by reinitializing layers closer to the input layer, compared to reinitializing layers closer to the output, also restoring the error barrier. |
Lee Zamparo · Marc-Etienne Brunet · Thomas George · Sepideh Kharaghani · Gintare Karolina Dziugaite 🔗 |
-
|
Biases in variational Bayesian neural networks
(
Poster
)
link »
Variational inference recently became the de facto standard method for approximate Bayesian neural networks. However, the standard mean-field approach (MFVI) possesses many undesirable behaviours. This short paper empirically investigates the variational biases of MFVI and other variational families. The preliminary results shed light on the poor performance of many variational approaches for model selection. |
Thang Bui 🔗 |
-
|
Bayesian Inference in Augmented Bow Tie Networks
(
Poster
)
link »
We develop a deep generative model that generalizes feed-forward, rectified linear neural networks with stochastic activations. We call these models bow tie networks because of the shape of their activation distributions. Then we leverage the Pólya-gamma augmentation scheme to render the model conditionally conjugate, and we derive a block Gibbs sampling algorithm based to approximate the posterior distribution over activations and model parameters. The resulting algorithm is massively parallelizable. We show a proof-of-concept of this model and Bayesian inference algorithm on a variety of standard regression benchmarks. |
Jimmy Smith · Dieterich Lawson · Scott Linderman 🔗 |
-
|
Fast Finite Width Neural Tangent Kernel
(
Poster
)
link »
The Neural Tangent Kernel (NTK), defined as the outer product of the neural network (NN) Jacobians, $\Theta_\theta(x_1, x_2) = \left[\partial f(\theta, x_1)\big/\partial \theta\right] \left[\partial f(\theta, x_2)\big/\partial \theta\right]^T$, has emerged as a central object of study in deep learning. In the infinite width limit, the NTK can sometimes be computed analytically and is useful for understanding training and generalization of NN architectures. At finite widths, the NTK is also used to better initialize NNs, compare the conditioning across models, perform architecture search, and do meta-learning. Unfortunately, the finite-width NTK is notoriously expensive to compute, which severely limits its practical utility. We perform the first in-depth analysis of the compute and memory requirements for NTK computation in finite width networks. Leveraging the structure of neural networks, we further propose two novel algorithms that change the exponent of the compute and memory requirements of the finite width NTK, dramatically improving efficiency.We open-source (https://github.com/iclr2022anon/fast_finite_width_ntk) our two algorithms as general-purpose JAX function transformations that apply to any differentiable computation (convolutions, attention, recurrence, etc.) and introduce no new hyper-parameters.
|
Roman Novak · Jascha Sohl-Dickstein · Samuel Schoenholz 🔗 |
-
|
Reliable Uncertainty Quantification of Deep Learning Models for a Free Electron Laser Scientific Facility
(
Poster
)
link »
Particle accelerators are essential instruments for scientific experiments. They provide different experiments with particle beams of different parameters (e.g. beam energies or durations). This is accomplished by changing a wide variety of controllable settings, in a process called tuning. This is a challenging task, as many particle accelerators are complex machines with thousands of components, each of which contribute sources of uncertainty. Fast, accurate models of these systems could aid rapid customization of beams, but in order to accomplish this reliably, quantified uncertainties are essential. We address the problem of obtaining reliable uncertainties from learned models of a noisy, high-dimensional, nonlinear accelerator system: the X-ray free electron laser at the Linac Coherent Light Source, which is a scientific user facility. We examine the efficacy of Bayesian Neural Networks (BNNs) to reliably quantify predictive uncertainty and compare these with Quantile Regression Neural Networks (QRNNs). The QRNN models provide mean absolute error on predictions that are consistent with the noise of the measured data. We find the BNN is sensitive to outliers and is substantially more computationally expensive, but it still captures the general trend of the target data. |
Lipi Gupta · Aashwin Mishra · Auralee Edelen 🔗 |
-
|
Latent Goal Allocation for Multi-Agent Goal-Conditioned Self-Supervised Learning
(
Poster
)
link »
Multi-agent learning plays an essential role in ubiquitous practical applications including game theory, autonomous driving, and etc. On the other end, goal-conditioned learning attracts a surge of interests with the capability of solving a rich variety of tasks and configurations. Nevertheless, the scenarios that combine both multi-agent and goal-conditioned settings have not been considered previously, attributed to the daunting challenges of both areas. In this work, we target \textbf{{\em multi-agent goal-conditioned tasks}}, with the objective of learning a universal policy for multiple agents to reach a set of sub-goals. This task necessitates the agents to execute differently conditioned on the assigned sub-goal. In various scenarios, considering it is infeasible to have access to direct rewards of actions and sub-goal assignment labels for each agent, we resort to imitation learning using only demonstrations of experts, without the need of a reward and sub-goal assignment labels. Regarding this, we propose a probabilistic graphical model, named Latent Goal Allocation (LGA), which explicitly promotes the sub-goal assignment as a latent variable to generate the corresponding action for each agent. We conduct experiments to show that the proposed LGA outperforms existing baselines with interpretable sub-goal assignment processes. |
Laixi Shi · Peide Huang · Rui Chen 🔗 |
-
|
Constraining cosmological parameters from N-body simulations with Bayesian Neural Networks
(
Poster
)
link »
In this paper we use The Quijote simulations in order to extract the cosmological parameters through Bayesian Neural Networks. This kind of models has a remarkable ability of estimating the associated uncertainty, which is one of the ultimate goals in the precision cosmology era. We demonstrate the advantages of BNNs for extracting more complex output distributions and non-Gaussianities information from the simulations. |
Hector Javier Hortua 🔗 |
-
|
Evaluating Deep Learning Uncertainty Quantification Methods for Neutrino Physics Applications
(
Poster
)
link »
We evaluate uncertainty quantification (UQ) methods for deep learning applied to liquid argon time projection chamber (LArTPC) physics analysis tasks. As deep learning applications enter widespread usage among physics data analysis, neural networks with reliable estimates of prediction uncertainty and robust performance against overconfidence and out-of-distribution (OOD) samples are critical for its full deployment in analyzing experimental data. While numerous UQ methods have been tested on simple datasets, performance evaluations for more complex tasks and datasets have been scarce. We assess the application of selected deep learning UQ methods on the task of particle classification in a simulated 3D LArTPC point cloud dataset. We observe that uncertainty enabled networks not only allow for better rejection of prediction mistakes and OOD detection, but also generally achieve higher overall accuracy across different task settings. |
Dae Heun Koh · Aashwin Mishra · Kazuhiro Terao 🔗 |
-
|
Model-embedding flows: Combining the inductive biases of model-free deep learning and explicit probabilistic modeling
(
Poster
)
link »
Normalizing flows have shown great success as general-purpose density estimators. However, many real-world applications require the use of domain-specific knowledge, which normalizing flows cannot readily incorporate. We propose embedded-model flows (EMF), which alternate general-purpose transformations with structured layers that embed domain-specific inductive biases. These layers are automatically constructed by converting user-specified differentiable probabilistic models into equivalent bijective transformations. We also introduce gated structured layers, which allow bypassing the parts of the models that fail to capture the statistics of the data. We demonstrate that EMFs can be used to induce desirable properties such as multimodality, hierarchical coupling and continuity. Furthermore, we show that EMFs enable a high-performance form of variational inference where the structure of the prior model is embedded in the variational architecture. In our experiments, we show that this approach outperforms state-of-the-art methods in common structured inference problems. |
Gianluigi Silvestri · Emily Fertig · Dave Moore · Luca Ambrogioni 🔗 |
-
|
Likelihood-free Density Ratio Acquisition Functions are not Equivalent to Expected Improvements
(
Poster
)
link »
Bayesian Optimization (BO) is one of the most effective black-box optimization methods, yet the need to ensure analytical tractability in the posterior predictive makes it challenging to apply BO to large-scale problems with high-dimensional observations. For these problems, likelihood-free methods present a promising avenue since they can work with more expressive models and are often more efficient. Previous papers have claimed that density ratios acquired from the likelihood-free inference are equivalent to the widely popular expected improvement acquisition function, allowing us to perform BO without expensive exact posterior inference. Unfortunately, we show in this paper that the claim is false; we identify errors in their reasoning and illustrate a counter-example where density ratios are inversely correlated to expected improvements. Our results suggest that additional care is needed when interpreting and applying density ratio acquisition functions from likelihood-free inference. |
Jiaming Song · Stefano Ermon 🔗 |
-
|
Object-Factored Models with Partially Observable State
(
Poster
)
link »
In a typical robot manipulation setting, the physical laws that govern object dynamics never change, but the set of objects does. To complicate matters, objects may have intrinsic properties that are not directly observable (e.g., center of mass or friction coefficients). In this work, we introduce a latent-variable model of object-factored dynamics. This model represents uncertainty about the dynamics using deep ensembles while capturing uncertainty about each object's intrinsic properties using object-specific latent variables. We show that this model allows a robot to rapidly generalize to new objects by using information theoretic active learning. Additionally, we highlight the benefits of the deep ensemble for robust performance in downstream tasks. |
Isaiah Brand · Michael Noseworthy · Sebastian Castro · Nick Roy 🔗 |
-
|
On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications
(
Poster
)
link »
Deep neural networks have shown great success in prediction quality while reliable and robust uncertainty estimation remains a challenge. Predictive uncertainty supplements model predictions and enables improved functionality of downstream tasks including embedded and mobile applications, such as virtual reality, augmented reality, sensor fusion, and perception. These applications often require a compromise in complexity to obtain uncertainty estimates due to very limited memory and compute resources. We tackle this problem by building upon Monte Carlo Dropout (MCDO) models using the Axolotl framework; specifically, we diversify sampled subnetworks, leverage dropout patterns, and use a branching technique to improve predictive performance while maintaining fast computations. We conduct experiments on (1) a multi-class classification task using the CIFAR10 dataset, and (2) a more complex human body segmentation task. Our results show the effectiveness of our approach by reaching close to Deep Ensemble prediction quality and uncertainty estimation, while still achieving faster inference on resource-limited mobile platforms. |
Johanna Rock · Tiago Azevedo · René de Jong · Daniel Ruiz · Partha Maji 🔗 |
-
|
Dropout and Ensemble Networks for Thermospheric Density Uncertainty Estimation
(
Poster
)
link »
Accurately estimating spacecraft location is of crucial importance for a variety of safety-critical tasks in low-Earth orbit (LEO), including satellite collision avoidance and re-entry. The solar activity largely impacts the physical characteristics of the thermosphere, consequently affecting trajectories of spacecraft in LEO. State-of-the-art models for estimating thermospheric density are either computationally expensive or under-perform during extreme solar activity. Moreover, these models provide single-point solutions, neglecting critical information on the associated uncertainty. In this work we use and compare two methods, Monte Carlo dropout and deep ensembles, to estimate thermospheric total mass density and associated uncertainty. The networks are trained using ground-truth density data from five well-calibrated satellites, using orbital data information, solar and geomagnetic indices as input. The trained models improve for a subset of satellites upon operational solutions, also providing measure of uncertainty in the density estimation. |
Stefano Bonasera · Giacomo Acciarini · Jorge Pérez-Hernández · Bernard Benson · Edward Brown · Eric Sutton · Moriba Jah · Christopher Bridges · Atilim Gunes Baydin 🔗 |
-
|
Benchmark for Out-of-Distribution Detection in Deep Reinforcement Learning
(
Poster
)
link »
Reinforcement Learning (RL) based solutions are being adopted in a variety of domains including robotics, health care and industrial automation. Most focus is given to when these solutions work well, but they fail when presented with out of distribution inputs. RL policies share the same faults as most machine learning models. Out of distribution detection for RL is generally not well covered in the literature, and there is a lack of benchmarks for this task. In this work we propose a benchmark to evaluate OOD detection methods in a Reinforcement Learning setting, by modifying the physical parameters of non-visual standard environments or corrupting the state observation for visual environments. We discuss ways to generate custom RL environments that can produce OOD data, and evaluate three uncertainty methods for the OOD detection task. Our results show that ensemble methods have the best OOD detection performance with a lower standard deviation across multiple environments. |
Aaqib Parvez Mohammed · Matias Valdenegro-Toro 🔗 |
-
|
Can Network Flatness Explain the Training Speed-Generalisation Connection?
(
Poster
)
link »
Recent work has shown that training speed, as estimated by the sum over training loss, is predictive of generalization performance. From a Bayesian perspective, this metric can be theoretically linked to marginal likelihood in linear models. However, it is unclear why the relationship holds for DNNs and what the underlying mechanisms are. We hypothesise that this relationship holds in DNNs because of network flatness, which causes both fast training speed and good generalization. We also investigated the hypothesis in varying settings and found that it might hold when the variance in the stochastic gradient estimation is moderate, with either logit averaging, or no data transformation at all. This paper specifies the conditions future works should impose when investigating the connecting mechanism. |
Albert Qiaochu Jiang · Clare Lyle · Lisa Schut · Yarin Gal 🔗 |
-
|
Mixture-of-experts VAEs can disregard unimodal variation in surjective multimodal data
(
Poster
)
link »
Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed variational autoencoders (VAEs) that generate multimodal data. We consider surjective data, where single datapoints from one modality (such as labels) describe multiple datapoints from another modality (such as images). We theoretically and empirically demonstrate that multimodal VAEs with mixture of experts posterior can struggle to capture unimodal variability in surjective data. |
Jannik Wolff · Tassilo Klein · Moin Nabi · Rahul G Krishnan · Shinichi Nakajima 🔗 |
-
|
Depth Uncertainty Networks for Active Learning
(
Poster
)
link »
In active learning, the size and complexity of the training dataset change over time. Simple models that are well specified by the amount of data available at the start of active learning might suffer from bias as more points are actively sampled. Flexible models that might be well suited to the full dataset can suffer from overfitting towards the start of active learning. We tackle this problem using Depth Uncertainty Networks (DUNs), a BNN variant in which the depth of the network, and thus its complexity, is inferred. We find that DUNs outperform other BNN variants on several active learning tasks. Importantly, we show that on the tasks in which DUNs perform best they present notably less overfitting than baselines. |
Chelsea Murray · James Allingham · Javier Antorán · José Miguel Hernández-Lobato 🔗 |
-
|
The Peril of Popular Deep Learning Uncertainty Estimation Methods
(
Poster
)
link »
Uncertainty estimation (UE) techniques---such as the Gaussian process (GP), Bayesian neural networks (BNN), Monte Carlo dropout (MCDropout)---aim to improve the interpretability of machine learning models by assigning an estimated uncertainty value to each of their prediciton outputs. However, since too high uncertainty estimates can have fatal consequences in practice, this paper analyzes the above techniques.Firstly, we show that GP methods always yield high uncertainty estimates on out of distribution (OOD) data. Secondly, we show on a 2D toy example that both BNNs and MCDropout do not give high uncertainty estimates on OOD samples. Finally, we show empirically that this pitfall of BNNs and MCDropout holds on real world datasets as well. Our insights (i) raise awareness for the more cautious use of currently popular UE methods in Deep Learning, (ii) encourage the development of UE methods that approximate GP-based methods---instead of BNNs and MCDropout, and (iii) our empirical setups can be used for verifying the OOD performances of any other UE method. |
Yehao Liu · Matteo Pagliardini · Tatjana Chavdarova · Sebastian Stich 🔗 |
-
|
Dependence between Bayesian neural network units
(
Poster
)
link »
The connection between Bayesian neural networks and Gaussian processes gained a~lot of attention in the last few years, with the flagship result that hidden units converge to a Gaussian process limit when the layers width tends to infinity. Underpinning this result is the fact that hidden units become independent in the infinite-width limit. Our aim is to shed some light on hidden units dependence properties in practical finite-width Bayesian neural networks. In addition to theoretical results, we assess empirically the depth and width impacts on hidden units dependence properties. |
Mariia Vladimirova · Julyan Arbel · Stephane Girard 🔗 |
-
|
Relaxed-Responsibility Hierarchical Discrete VAEs
(
Poster
)
link »
Successfully training Variational Autoencoders (VAEs) with a hierarchy of discrete latent variables remains an area of active research. Vector-Quantised VAEs are a powerful approach to discrete VAEs, but naive hierarchical extensions can be unstable when training. Leveraging insights from classical methods of inference we introduce Relaxed-Responsibility Vector-Quantisation, a novel way to parameterise discrete latent variables, a refinement of relaxed Vector-Quantisation that gives better performance and more stable training. This enables a novel approach to hierarchical discrete variational autoencoders with numerous layers of latent variables (here up to 32) that we train end-to-end. Within hierarchical probabilistic deep generative models with discrete latent variables trained end-to-end, we achieve state-of-the-art bits-per-dim results for various standard datasets. |
Matthew Willetts · Xenia Miscouridou · Stephen J Roberts · Chris C Holmes 🔗 |
-
|
Precision Agriculture Based on Bayesian Neural Network
(
Poster
)
link »
Precision agriculture, utilizing various information to manage crop production, has become the important approach to imitate the food supply problem around the world. Accurate prediction of crop yield is the main task of precision agriculture. With the help of neural networks, precision agriculture has progressed rapidly in past decades. However, neural networks are notoriously data-hungry anddata collection in agriculture is expensive and time-consuming. Bayesian neural network, extending the neural network with Bayes inference, is useful under such circumstance. Moreover, Bayesian allows to estimate uncertainty associated with prediction which makes the result more reliable. In this paper, a Bayesian neural network was applied a small dataset and the result shows Bayesian neural networkis more reliable under such circumstance. |
lei zhao 🔗 |
-
|
Decomposing Representations for Deterministic Uncertainty Estimation
(
Poster
)
link »
Uncertainty estimation is a key component in any deployed machine learning system. One way to evaluate uncertainty estimation is using “out-of-distribution” (OoD) detection, that is, distinguishing between the training data distribution and an unseen different data distribution using uncertainty. In this work, we show that current feature density based uncertainty estimators cannot perform well consistently across different OoD detection settings. To solve this, we propose to decompose the learned representations and integrate the uncertainties estimated on them separately. Through experiments, we demonstrate that we can greatly improve the performance and the interpretability of the uncertainty estimation. |
Haiwen Huang · Joost van Amersfoort · Yarin Gal 🔗 |
-
|
Gaussian dropout as an information bottleneck layer
(
Poster
)
link »
As models become more powerful, they can acquire the ability to fit the data well in multiple qualitatively different ways. At the same time, we might have requirements other than high predictive performance that we would like the model to satisfy. One way to express such preferences is by controlling the information flow in the model with carefully placed information bottleneck layers, which limit the amount of information that passes through them by applying noise to their inputs. The most notable example of such a layer is the stochastic representation layer of the Deep Variational Information Bottleneck, using which requires adding a variational upper bound on the mutual information between its inputs and outputs as a penalty to the loss function. We show that using Gaussian dropout, which involves multiplicative Gaussian noise, achieves the same goal in a simpler way without requiring any additional terms in the objective. We evaluate the two approaches in the generative modelling setting, by using them to encourage the use of latent variables in a VAE with an autoregressive decoder for modelling images. |
Melanie Rey · Andriy Mnih 🔗 |
-
|
Funnels: Exact maximum likelihood with dimensionality reduction
(
Poster
)
link »
Normalizing flows are diffeomorphic, typically dimension-preserving, models trained using the likelihood of the model. We use the SurVAE framework to construct dimension reducing surjective flows via a new layer, known as the funnel. We demonstrate its efficacy on a variety of datasets, and show it improves upon or matches the performance of existing flows while having a reduced latent space size. This layer can also be used with convolutional and feed forward layers. |
Samuel Klein · John Raine · Tobias Golling · Slava Voloshynovskiy · Sebastion Pina-Otey 🔗 |
-
|
Progress in Self-Certified Neural Networks
(
Poster
)
link »
A learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality with a statistical certificate that is valid on unseen data. Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead not only to accurate predictors, but also to tight risk certificates, bearing promise towards self-certified learning. In this context, learning and certification strategies based on PAC-Bayes bounds are especially attractive due to their ability to leverage all data to learn a posterior and simultaneously certify its risk. In this paper, we assess the progress towards self-certification in neural networks learnt by PAC-Bayes inspired objectives. We empirically compare (on 4 classification datasets) classical test set bounds for deterministic predictors and a PAC-Bayes bound for randomised self-certified predictors. We show that in data starvation regimes, holding out data for the test set bounds adversely affects generalisation performance, while learning and certification strategies based on PAC-Bayes bounds do not suffer from this drawback. We find that probabilistic neural networks learnt by PAC-Bayes inspired objectives lead to certificates that can be surprisinglycompetitive with commonly used test set bounds. |
Maria Perez-Ortiz · Omar Rivasplata · Emilio Parrado-Hernández · Benjamin Guedj · John Shawe-Taylor 🔗 |
-
|
Multimodal Relational VAE
(
Poster
)
link »
In this work, we propose a new formulation for multimodal VAEs to model and learn the relationship between data types. Despite their recent progress, current multimodal generative methods are based on simplistic assumptions regarding the relation between data types, which leads to a trade-off between coherence and quality of generated samples - even for simple toy datasets. The proposed method learns the relationship between data types instead of relying on pre-defined and limiting assumptions. Based on the principles of variational inference, we change the posterior approximation to explicitly include information about the relation between data types. We show empirically that the simplified assumption of a single shared latent space leads to inferior performance for a dataset with additional pairwise shared information. |
Thomas Sutter · Julia Vogt 🔗 |
-
|
Laplace Approximation with Diagonalized Hessian for Over-parameterized Neural Networks
(
Poster
)
link »
Bayesian Neural Networks (BNNs) provide valid uncertainty estimation on their feedforward outputs. However, it can become computationally prohibitive to apply them to modern large-scale neural networks. In this work, we combine Laplace approximation with linearized inference for a real-time and robust uncertainty evaluation. Specifically, we study the effectiveness and computational necessity of a diagonal Hessian approximation in Laplace approximation on over-parameterized networks. The proposed approach is investigated on object detection tasks in an autonomous driving scenario and demonstrates faster inference speed and convincing results. |
Ming Gui · Ziqing Zhao · Tianming Qiu · Hao Shen 🔗 |
-
|
Exploring the Limits of Epistemic Uncertainty Quantification in Low-Shot Settings
(
Poster
)
link »
Uncertainty quantification in neural network promises to increase safety of AI systems, but it is not clear how performance might vary with the training set size. In this paper we evaluate seven uncertainty methods on Fashion MNIST and CIFAR10, as we sub-sample and produce varied training set sizes. We find that calibration error and out of distribution detection performance strongly depend on the training set size, with most methods being miscalibrated on the test set with small training sets. Gradient-based methods seem to poorly estimate epistemicuncertainty and are the most affected by training set size. We expect our results can guide future research into uncertainty quantification and help practitioners select methods based on their particular available data. |
Matias Valdenegro-Toro 🔗 |
-
|
Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning
(
Poster
)
link »
Deep neural networks are prone to overconfident predictions on outliers. Bayesian neural networks and deep ensembles have both been shown to mitigate this problem to some extent. In this work, we aim to combine the benefits of the two approaches by proposing to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep neural networks. The method can be used \textit{post hoc} with any set of pre-trained networks and only requires a small computational and memory overhead compared to regular ensembles. We theoretically validate that our approach mitigates overconfidence ``far away'' from the training data and empirically compare against state-of-the-art baselines on standard uncertainty quantification benchmarks. |
Runa Eschenhagen · Erik Daxberger · Philipp Hennig · Agustinus Kristiadi 🔗 |
-
|
Kronecker-Factored Optimal Curvature
(
Poster
)
link »
The current scalable Bayesian methods for Deep Neural Networks (DNNs) often rely on the Fisher Information Matrix (FIM). For the tractable computations of the FIM, the Kronecker-Factored Approximate Curvature (K-FAC) method is widely adopted, which approximates the true FIM by a layer-wise block-diagonal matrix, and each diagonal block is then Kronecker-factored. In this paper, we propose an alternative formulation to obtain the Kronecker-factored FIM. The key insight is to cast the given FIM computations into an optimization problem over the sums of Kronecker products. In particular, we prove that this formulation is equivalent to the best rank-one approximation problem, where the well-known power iteration method is guaranteed to converge to an optimal rank-one solution - resulting in our novel algorithm: the Kronecker-Factored Optimal Curvature (K-FOC). In a proof-of-concept experiment, we show that the proposed algorithm can achieve more accurate estimates of the true FIM when compared to the K-FAC method. |
Dominik Schnaus · Jongseok Lee · Rudolph Triebel 🔗 |
-
|
Contrastive Generative Adversarial Network for Anomaly Detection
(
Poster
)
link »
Anomaly detection (AD) is a fundamental challenge in machine learning that finds samples that do not belong to the distribution of the training data. Recently self-supervised learning approaches and, in particular, contrastive learning show promising results in various machine vision applications mitigating the hunger of traditional supervised deep learning approaches for an enormous amount of labeled data. In this work, we adopt the idea of contrastive learning for reconstruction-based anomaly detection models. Our contrastive learning approach contrasts the sample with local feature maps of itself instead of contrasting a given sample with other instances as in conventional contrastive learning approaches. Our anomaly detection model based on contrastive generative adversarial network, AD-CGAN, is shown to obtain state-of-the-art performance in multiple benchmark datasets. AD-CGAN outperforms the existing reconstruction-based approaches by more than $15\%$ ROC-AUC in several benchmark experiments.
|
Laya Rafiee Sevyeri · Thomas Fevens 🔗 |
-
|
Certifiably Robust Variational Autoencoders
(
Poster
)
link »
We derive bounds on the minimal size of an input perturbation required to change a VAE’s reconstruction by more than an allowed amount, with these bounds depending on key parameters such as the Lipschitz constants of the encoder and decoder. Our bounds allow one to specify a desired level of robustness upfront and then train a VAE that is certified to achieve this robustness. |
Ben Barrett · Alexander Camuto · Matthew Willetts · Thomas Rainforth 🔗 |
-
|
On Symmetries in Variational Bayesian Neural Nets
(
Poster
)
link »
Probabilistic inference of Neural Network parameters is challenging due to the highly multi-modal likelihood functions. Most importantly, the permutation invariance of the neurons of the hidden layers renders the likelihood function unidentifiable with a factorial number of equivalent (symmetric) modes, independent of the data. We show that variational Bayesian methods that approximate the (multi-modal) posterior by a (uni-modal) Gaussian distribution are biased towards approximations with identical (e.g. zero-centred) weights, resulting in severe underfitting.This explains the common empirical observation that, in contrast to MCMC methods, variational approximations typically collapse most weights to the (zero-centred) prior.We propose a simple modification to the likelihood function that breaks the symmetry using fixed semi-orthogonal matrices as skip connections in each layer.Initial empirical results show an improved predictive performance. |
Richard Kurle · Tim Januschowski · Jan Gasthaus · Bernie Wang 🔗 |
-
|
Greedy Bayesian Posterior Approximation with Deep Ensembles
(
Poster
)
link »
Ensembles of independently trained neural networks are a state-of-the-art approach to estimate predictive uncertainty in Deep Learning, and can be interpreted as an approximation of the posterior distribution via a mixture of delta functions. The training of ensembles relies on non-convexity of the loss landscape and random initialization of their individual members, making the resulting posterior approximation uncontrolled. This paper proposes a novel and principled method to tackle this limitation, minimizing an $f$-divergence between the true posterior and a kernel density estimator in a function space. We analyze this objective from a combinatorial point of view, and show that it is submodular with respect to mixture components for any $f$. Subsequently, we consider the problem of ensemble construction, and from the marginal gain of the total objective, we derive a novel diversity term for training ensembles greedily. The performance of our approach is demonstrated on computer vision out-of-distribution detection benchmarks in a range of architectures trained on multiple datasets. The source code of our method is publicly available at https://github.com/MIPT-Oulu/greedy_ensembles_training.
|
Aleksei Tiulpin · Matthew Blaschko 🔗 |
-
|
On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty
(
Poster
)
link »
Inducing point Gaussian process approximations are often considered a gold standard in uncertainty estimation since they retain many of the properties of the exact GP and scale to large datasets. A major drawback is that they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) promises a solution: a deep feature extractor transforms the inputs over which an inducing point Gaussian process is defined. However, DKL has been shown to provide unreliable uncertainty estimates in practice. We study why, and show that with no constraints, the DKL objective pushes ``far-away'' data points to be mapped to the same features as those of training-set points. With this insight we propose to constrain DKL's feature extractor to approximately preserve distances through a bi-Lipschitz constraint, resulting in a feature space favorable to DKL. We obtain a model, DUE, which demonstrates uncertainty quality outperforming previous DKL and other single forward pass uncertainty methods, while maintaining the speed and accuracy of standard neural networks. |
Joost van Amersfoort · Lewis Smith · Andrew Jesson · Oscar Key · Yarin Gal 🔗 |
-
|
An Empirical Study of Neural Kernel Bandits
(
Poster
)
link »
Neural bandits have enabled practitioners to operate efficiently on problems with non-linear reward functions. While in general contextual bandits commonly utilize Gaussian process (GP) predictive distributions for decision making, the most successful neural variants use only the last layer parameters in the derivation. Research on neural kernels (NK) has recently established a correspondence between deep networks and GPs that take into account all the parameters of a NN and can be trained more efficiently than most Bayesian NNs. We propose to directly apply NK-induced distributions to guide an upper confidence bound or Thompson sampling-based policy. We show that NK bandits achieve state-of-the-art performance on highly non-linear structured data. Furthermore, we analyze practical considerations such as training frequency and model partitioning. We believe our work will help better understand the impact of utilizing NKs in applied settings. |
Michal Lisicki · Arash Afkanpour · Graham Taylor 🔗 |
-
|
Structured Stochastic Gradient MCMC: a hybrid VI and MCMC approach
(
Poster
)
link »
Stochastic gradient Markov chain Monte Carlo (SGMCMC) is considered the gold standard for Bayesian inference in large-scale models, such as Bayesian neural networks. Since practitioners face speed versus accuracy tradeoffs in these models, variational inference (VI) is often the preferable option. Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. In this work, we propose a new non-parametric variational approximation that makes no assumptions about the approximate posterior's functional form and allows practitioners to specify the exact dependencies the algorithm should respect or break. The approach relies on a new Langevin-type algorithm that operates on a modified energy function, where parts of the latent variables are averaged over samples from earlier iterations of the Markov chain. This way, statistical dependencies can be broken in a controlled way, allowing the chain to mix faster. This scheme can be further modified in a ``dropout'' manner, leading to even more scalability. By implementing the scheme on a ResNet-20 architecture, we obtain better predictive likelihoods and larger effective sample sizes than full SGMCMC. |
Antonios Alexos · Alex Boyd · Stephan Mandt 🔗 |
-
|
Contrastive Representation Learning with Trainable Augmentation Channel
(
Poster
)
link »
In contrastive representation learning, data representation is trained so that it can classify the image instances even when the images are altered by augmentations.However, depending on the datasets, some augmentations can damage the information of the images beyond recognition, and such augmentations can result in collapsed representations.We present a partial solution to this problem by formalizing a stochastic encoding process in which there exist a tug-of-war between the data corruption introduced by the augmentations and the information preserved by the encoder.We show that, with the infoMax objective based on this framework, we can learn a data-dependent distribution of augmentations to avoid the collapse of the representation. |
Masanori Koyama · Kentaro Minami · Takeru Miyato · Yarin Gal 🔗 |
-
|
Power-law asymptotics of the generalization error for GP regression under power-law priors and targets
(
Poster
)
link »
We study the power-law asymptotics of learning curves for Gaussian process regression (GPR). When the eigenspectrum of the prior decays with rate $\alpha$ and the eigenexpansion coefficients of the target function decay with rate $\beta$, we show that the Bayesian generalization error behaves as $\tilde O(n^{\max\{\frac{1}{\alpha}-1, \frac{1-2\beta}{\alpha}\}})$ with high probability over the draw of $n$ input samples. Infinitely wide neural networks can be related to GPR with respect to the Neural Network Gaussian Process kernel, which in several cases is known to have a power-law spectrum. Hence our methods can be applied to study the generalization error of infinitely wide neural networks. We present toy experiments demonstrating the theory.
|
Hui Jin · Pradeep Kr. Banerjee · Guido Montufar 🔗 |
-
|
Deep Bayesian Learning for Car Hacking Detection
(
Poster
)
link »
With the rise of self-drive cars and connected-vehicles, cars are equipped with various devices to assistant the drivers or support self-drive systems. Undoubtedly, cars have become more intelligent as we can deploy more and more devices and software on the cars. Accordingly, the security of assistant and self-drive systems in the cars becomes a life threatening issue as smart cars can be invaded by malicious attacks that cause traffic accidents. Currently, canonical machine learning and deep learning methods are extensively employed in car hacking detection. However, machine learning and deep learning methods can easily be overconfident and defeated by carefully designed adversarial examples. Moreover, those methods cannot provide explanations for security engineers for further analysis. In this work, we investigated Deep Bayesian Learning models to detect and analyze car hacking behaviors. The Bayesian learning methods can capture the uncertainty of the data and avoid overconfident issues. Moreover, the Bayesian models can provide more information to support the prediction results that can help security engineers further identify the attacks. We have compared our model with deep learning models and the results show the advantages of our proposed model. The code of this work is publicly available. |
Laha Ale · Scott King · Ning Zhang 🔗 |
-
|
Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning
(
Poster
)
link »
High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-ofthe-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. https://github.com/google/uncertainty-baselines |
Zachary Nado · Neil Band · Mark Collier · Josip Djolonga · Mike Dusenberry · Sebastian Farquhar · Qixuan Feng · Angelos Filos · Marton Havasi · Rodolphe Jenatton · Ghassen Jerfel · Jeremiah Liu · Zelda Mariet · Jeremy Nixon · Shreyas Padhy · Jie Ren · Tim G. J. Rudner · Yeming Wen · Florian Wenzel · Kevin Murphy · D. Sculley · Balaji Lakshminarayanan · Jasper Snoek · Yarin Gal · Dustin Tran
|
-
|
Generation of data on discontinuous manifolds via continuous stochastic non-invertible networks
(
Poster
)
link »
The generation of discontinuous distributions is a difficult task for most known frameworks, such as generative autoencoders and generative adversarial networks. Generative non-invertible models are unable to accurately generate such distributions, require long training and often are subject to mode collapse. Variational autoencoders (VAEs), which are based on the idea of keeping the latent space to be Gaussian for the sake of a simple sampling, allow an accurate reconstruction, while they experience significant limitations at generation level. In this work, instead of trying to keep the latent space Gaussian, we use a pretrained contrastive encoder to obtain a clustered latent space. Then, for each cluster, representing a unimodal submanifold, we train a dedicated low complexity network to generate it from the Gaussian distribution. The proposed framework is based on the information-theoretic formulation of mutual information maximization between the input data and latent space representation. We derive a link between the cost functions and the information-theoretic formulation. We apply our approach to synthetic 2D distributions to demonstrate both reconstruction and generation of discontinuous distributions using continuous stochastic networks. |
Mariia Drozdova · Vitaliy Kinakh · Guillaume Quétant · Tobias Golling · Slava Voloshynovskiy 🔗 |
-
|
Uncertainty Quantification in End-to-End Implicit Neural Representations for Medical Imaging
(
Poster
)
link »
Implicit neural representations (INRs) have recently achieved impressive results in image representation. This work explores the uncertainty quantification quality of INRs for medical imaging. We propose the first uncertainty aware, end-to-end INR architecture for computed tomography (CT) image reconstruction. Four established neural network uncertainty quantification techniques -- deep ensembles, Monte Carlo dropout, Bayes-by-backpropagation, and Hamiltonian Monte Carlo -- are implemented and assessed according to both image reconstruction quality and model calibration. We find that these INRs outperform traditional medical image reconstruction algorithms according to predictive accuracy; deep ensembles of Monte Carlo dropout base-learners achieve the best image reconstruction and model calibration among the techniques tested; activation function and random Fourier feature embedding frequency have large effects on model performance; and Bayes-by-backpropogation is ill-suited for sampling from the INR posterior distributions. Preliminary results further indicate that, with adequate tuning, Hamiltonian Monte Carlo may outperform Monte Carlo dropout deep ensembles. |
Francisca Vasconcelos · Bobby He · Yee Teh 🔗 |
-
|
Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data
(
Poster
)
link »
Most machine learning models operate under the assumption that the training, testing and deployment data is independent and identically distributed (i.i.d.).This assumption doesn’t generally hold true in a natural setting. Usually, the deployment data is subject to various types of distributional shifts. The magnitude of a model’s performance is proportional to this shift in the distribution of the dataset. Thus it becomes necessary to evaluate a model’s uncertainty and robustness to distributional shift to get a realistic estimate of its expected performance on real-world data. Present methods to evaluate uncertainty and model’s robustness are lacking and often fail to paint the full picture. Moreover, most analysis so far has primarily focused on classification tasks. In this paper, we propose more insightful metrics for general regression tasks using the Shifts Weather Prediction Dataset. We also present an evaluation of the baseline methods using these metrics. |
Kumud Lakara · Akshat Bhandari · Pratinav Seth · Ujjwal Verma 🔗 |
-
|
Generalization Gap in Amortized Inference
(
Poster
)
link »
The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications. The Variational Auto-Encoder (VAE) is a popular class of latent variable model used for many such applications including density estimation, representation learning and lossless compression. In this work, we highlight how the common use of amortized inference to scale the training of VAE models to large data sets can be a major cause of poor generalization performance. We propose a new training phase for the inference network that helps reduce over-fitting to training data. We demonstrate how the proposed scheme can improve generalization performance in the context of image modeling. |
Mingtian Zhang · Peter Hayes · David Barber 🔗 |
-
|
Information-theoretic stochastic contrastive conditional GAN: InfoSCC-GAN
(
Poster
)
link »
Conditional generation is a subclass of generative problems where the output of the generation is conditioned by the attribute information. In this paper, we present a stochastic contrastive conditional generative adversarial network (InfoSCC-GAN) with an explorable latent space. The InfoSCC-GAN architecture is based on an unsupervised contrastive encoder built on the InfoNCE paradigm, an attribute classifier, and an EigenGAN generator. We propose a novel training method, based on generator regularization using external or internal attributes every $n$-th iteration, using a pre-trained contrastive encoder and a pre-trained classifier. The proposed InfoSCC-GAN is derived based on an information-theoretic formulation of mutual information maximization between the input data and latent space representation as well as latent space and generated data. Thus, we demonstrate a link between the training objective functions and the above information-theoretic formulation. The experimental results show that InfoSCC-GAN outperforms the "vanilla" EigenGAN in the image generation on several datasets. In addition, we investigate the impact of regularization techniques, discriminator architectures, and loss functions by performing ablation studies.Finally, we demonstrate that thanks to the EigenGAN generator, the proposed framework enjoys a stochastic generation in contrast to vanilla deterministic GANs yet with the independent training of encoder, classifier, and generator in contrast to existingframeworks.Code, experimental results, and demos are available \url{https://anonymous.4open.science/r/InfoSCC-GAN-D113}.
|
Vitaliy Kinakh · Mariia Drozdova · Guillaume Quétant · Tobias Golling · Slava Voloshynovskiy 🔗 |
-
|
Deep Classifiers with Label Noise Modeling and Distance Awareness
(
Poster
)
link »
Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness in safety-critical applications. While there have been many proposed methods that either focus on distance-aware model uncertainties for out-of-distribution detection or on input-dependent label uncertainties for in-distribution calibration, both of these types of uncertainty are often necessary. In this work, we propose the HetSNGP method for jointly modeling the model and data uncertainty. We show that our proposed model affords a favorable combination between these two complementary types of uncertainty and thus outperforms the baseline methods on some challenging out-of-distribution datasets, including CIFAR-100C, Imagenet-C, and Imagenet-A. Moreover, we propose HetSNGP Ensemble, an ensembled version of our method which adds an additional type of uncertainty and also outperforms other ensemble baselines. |
Vincent Fortuin · Mark Collier · Florian Wenzel · James Allingham · Jeremiah Liu · Dustin Tran · Balaji Lakshminarayanan · Jesse Berent · Rodolphe Jenatton · Effrosyni Kokiopoulou 🔗 |
-
|
Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks
(
Poster
)
link »
Bayesian deep learning seeks to equip deep neural networks with the ability to precisely quantify their predictive uncertainty, and has promised to make deep learning more reliable for safety-critical real-world applications. Yet, existing Bayesian deep learning methods fall short of this promise; new methods continue to be evaluated on unrealistic test beds that do not reflect the complexities of the downstream real-world tasks that would benefit most from reliable uncertainty quantification. We propose a set of real-world tasks that accurately reflect such complexities and assess the reliability of predictive models in safety-critical scenarios. Specifically, we curate two publicly available datasets of high-resolution human retina images exhibiting varying degrees of diabetic retinopathy, a medical condition that can lead to blindness, and use them to design a suite of automated diagnosis tasks that require reliable predictive uncertainty quantification. We use these tasks to benchmark well-established and state-of-the-art Bayesian deep learning methods on task-specific evaluation metrics. We provide an easy-to-use codebase for fast and easy benchmarking following reproducibility and software design principles. |
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal 🔗 |
-
|
Stochastic Local Winner-Takes-All Networks Enable Profound Adversarial Robustness
(
Poster
)
link »
This work explores the potency of stochastic competition-based activations, namely Stochastic Local Winner-Takes-All (LWTA), against powerful (gradient-based) white-box and black-box adversarial attacks; we especially focus on Adversarial Training settings. In our work, we replace the conventional ReLU-based nonlinearities with blocks comprising locally and stochastically competing linear units. The output of each network layer now yields a sparse output, depending on the outcome of winner sampling in each block. We rely on the Variational Bayesian framework for training and inference; we incorporate conventional PGD-based adversarial training arguments to increase the overall adversarial robustness. As we experimentally show, the arising networks yield state-of-the-art robustness against powerful adversarial attacks while retaining very high classification rate in the benign case. |
Konstantinos Panousis · Sotirios Chatzis · Sergios Theodoridis 🔗 |
-
|
Being a Bit Frequentist Improves Bayesian Neural Networks
(
Poster
)
link »
Despite their compelling theoretical properties, Bayesian neural networks (BNNs) tend to perform worse than frequentist methods in classification-based uncertainty quantification (UQ) tasks such as out-of-distribution (OOD) detection. In this paper, based on empirical findings in prior works, we hypothesize that this issue is because even recent Bayesian methods have never considered OOD data in their training processes, even though this ``OOD training'' technique is an integral part of state-of-the-art frequentist UQ methods. To validate this, we treat OOD data as a first-class citizen in BNN training by exploring several ways of incorporating OOD data in Bayesian inference. We show in experiments that OOD-trained BNNs are competitive to, if not better than recent frequentist baselines. This work thus provides strong baselines for future work in Bayesian deep learning. |
Agustinus Kristiadi · Matthias Hein · Philipp Hennig 🔗 |
-
|
Reproducible, incremental representation learning with Rosetta VAE
(
Poster
)
link »
Variational autoencoders are among the most popular methods for distilling low-dimensional structure from high-dimensional data, making them increasingly valuable as tools for data exploration and scientific discovery. However, unlike typical machine learning problems, in which a single model is trained once on a single large dataset, scientific workflows privilege learned features that are reproducible, portable across labs, and capable of incrementally adding new data. Ideally, methods used by different research groups should produce comparable results, even without sharing fully-trained models or entire data sets. Here, we address this challenge by introducing the Rosetta VAE (R-VAE), a method of distilling previously learned representations and retraining new models to reproduce and build on prior results. The R-VAE uses post hoc clustering over the latent space of a fully-trained model to identify a small number of Rosetta Points (input, latent pairs) to serve as anchors for training future models. An adjustable hyperparameter, , balances fidelity to the previously learned latent space against accommodation of new data. We demonstrate that the R-VAE reconstructs data as well as the VAE and -VAE, outperforms both methods in recovery of a target latent space in a sequential training setting, and dramatically increases consistency of the learned representation across training runs. Similar to other VAE methods, R-VAE makes few assumptions about the data and underlying distributions, uses the same number of hyperparameters as -VAE, and provides a simple and intuitive solution to stable and consistent retraining. |
Miles Martinez · John Pearson 🔗 |
-
|
An Empirical Comparison of GANs and Normalizing Flows for Density Estimation
(
Poster
)
link »
Generative adversarial networks (GANs) and normalizing flows are both approaches to density estimation that use deep neural networks to transform samples from an uninformative prior distribution to an approximation of the data distribution. There is great interest in both for general-purpose statistical modeling, but the two approaches have seldom been compared to each other for modeling non-image data. The difficulty of computing likelihoods with GANs, which are implicit models, makes conducting such a comparison challenging. We work around this difficulty by considering several low-dimensional synthetic datasets. An extensive grid search over GAN architectures, hyperparameters, and training procedures suggests that no GAN is capable of modeling our simple low-dimensional data well, a task we view as a prerequisite for an approach to be considered suitable for general-purpose statistical modeling. Several normalizing flows, on the other hand, excelled at these tasks, even substantially outperforming WGAN in terms of Wasserstein distance---the metric that WGAN alone targets. Scientists and other practitioners should be wary of relying on WGAN for applications that require accurate density estimation. |
TIanci Liu · Jeffrey Regier 🔗 |
-
|
Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks
(
Poster
)
link »
We consider the problem of the stability of saliency-based explanations of Neural Network predictions under adversarial attacks in a classification task. Saliency interpretations of deterministic Neural Networks are remarkably brittle even whenthe attacks fail, i.e. for attacks that do not change the classification label. We empirically show that interpretations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations of the inputs and evenunder direct attacks to the explanations. By leveraging recent results, we also provide a theoretical explanation of this result in terms of the geometry of the data manifold. Additionally, we discuss the stability of the interpretations of high level representations of the inputs in the internal layers of a Network. Our results demonstrate that Bayesian methods, in addition to be more robust to adversarialattacks, have the potential to provide more stable and interpretable assessments of Neural Network predictions. |
Ginevra Carbone · Luca Bortolussi · Guido Sanguinetti 🔗 |
-
|
Non-stationary Gaussian process discriminant analysis with variable selection for high-dimensional functional data
(
Poster
)
link »
High-dimensional classification and feature selection tasks are ubiquitous with the recent advancement in data acquisition technology. In several application areas such as biology, genomics and proteomics, the data are often functional in their nature and exhibit a degree of roughness and non-stationarity. These structures pose additional challenges to commonly used methods that rely mainly on a two-stage approach performing variable selection and classification separately. We propose in this work a novel Gaussian process discriminant analysis (GPDA) that combines these steps in a unified framework. Our model is a two-layer non-stationary Gaussian process coupled with an Ising prior to identify differentially-distributed locations. Scalable inference is achieved via developing a variational scheme that exploits advances in the use of sparse inverse covariance matrices. We demonstrate the performance of our methodology on simulated datasets and two proteomics datasets: breast cancer and SARS-CoV-2. Our approach distinguishes itself by offering explainability as well as uncertainty quantification in addition to low computational cost, which are crucial to increase trust and social acceptance of data-driven tools. |
Weichang Yu · Sara Wade · Howard Bondell · Lamiae Azizi 🔗 |
-
|
Pathologies in Priors and Inference for Bayesian Transformers
(
Poster
)
link »
In recent years, the transformer has established itself as a workhorse in many applications ranging from natural language processing to reinforcement learning. Similarly, Bayesian deep learning has become the gold-standard for uncertainty estimation in safety-critical applications, where robustness and calibration are crucial.Surprisingly, no successful attempts to improve transformer models in terms of predictive uncertainty using Bayesian inference exist.In this work, we study this curiously underpopulated area of Bayesian transformers.We find that weight-space inference in transformers does not work well, regardless of the approximate posterior.We also find that the prior is at least partially at fault, but that it is very hard to find well-specified weight priors for these models.We hypothesize that these problems stem from the complexity of obtaining a meaningful mapping from weight-space to function-space distributions in the transformer.Therefore, moving closer to function-space, we propose a novel method based on the implicit reparameterization of the Dirichlet distribution to apply variational inference directly to the attention weights.We find that this proposed method performs competitively with our baselines. |
Tristan Cinquin · Alexander Immer · Max Horn · Vincent Fortuin 🔗 |
-
|
Analytically Tractable Inference in Neural Networks - An Alternative to Backpropagation
(
Poster
)
link »
Until now, neural networks have been predominantly relying on backpropagation and gradient descent as the inference engine in order to learn the neural network's parameters. This is primarily because closed-form Bayesian inference for neural networks has been considered to be intractable. This short paper will outline a new analytical method for performing tractable approximate Gaussian inference (TAGI) in Bayesian neural networks. The method enables the analytical inference of the posterior mean vector and diagonal covariance matrix for weights and biases. One key aspect is that the method matches or exceeds the state-of-the-art performance while having the same computational complexity as current methods relying on the gradient backpropagation, i.e., linear complexity with respect to the number of parameters in the network. Performing Bayesian inference in neural networks enables several key features, such as the quantification of epistemic uncertainty associated with model parameters, the online estimation of parameters, and a reduction in the number of hyperparameters due to the absence of gradient-based optimization. Moreover, the analytical framework proposed also enables unprecedented features such as the propagation of uncertainty from the input of a network up to its output, and it allows inferring the value of hidden states, inputs, as well as latent variables. The first part covers the theoretical foundation and working principles of the analytically tractable uncertainty propagation in neural networks, as well as the parameter and hidden state inference. Then, the second part will go through benchmarks demonstrating the superiority of the approach on supervised, unsupervised, and reinforcement learning tasks. In addition, we will showcase how TAGI can be applied to reinforcement learning problems such as the Atari game environment. Finally, the last part will present how we can leverage the analytic inference capabilities of our approach to enable novel applications of neural networks such as closed-form direct adversarial attacks, and the usage of a neural network as a generic black-box optimization method. |
Luong-Ha Nguyen · James-A. Goulet 🔗 |
-
|
Infinite-channel deep convolutional Stable neural networks
(
Poster
)
link »
The connection between infinite-width neural networks (NNs) and Gaussian processes (GPs) is well known since the seminal work of Neal (1996). While numerous theoretical refinements have been proposed in recent years, the connection between NNs and GPs relies on two critical distributional assumptions on the NN's parameters: i) finite variance ii) independent and identical distribution (iid). In this paper, we consider the problem of removing assumption i) in the context of deep feed-forward convolutional NNs. We show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a stochastic process with multivariate stable finite-dimensional distributions, and we give an explicit recursion over the layers for their parameters. Our contribution extends recent results of Favaro et al (2021) to convolutional architectures, and it paves the way to exciting lines of research that rely on GP limits. |
Daniele Bracale · Stefano Favaro · Sandra Fortini · Stefano Peluchetti 🔗 |
-
|
Unveiling Mode-connectivity of the ELBO Landscape
(
Poster
)
link »
We demonstrate and discuss mode-connectivity of the ELBO, the objective function of variational inference (VI). Local optima of the ELBO are found to be connected by essentially flat maximum energy paths (MEPs), suggesting that optima of the ELBO are not discrete modes but lie on a connected subset in parameter space. We focus on Latent Dirichlet Allocation, a model commonly fit with VI. Our findings parallel recent results showing mode-connectivity of neural net loss functions, a property that has helped explain and improve the performance of neural nets. We find MEPs between maxima of the ELBO using the simplified string method (SSM), a gradient-based algorithm that updates images along a path on the ELBO. The mode-connectivity property is explained with a heuristic argument about statistical degeneracy, related to over-parametrization in neural networks. This study corroborates and extends the empirical experience that topic modeling has many optima, providing a loss-landscape-based explanation for the ``no best answer" phenomenon experienced by practitioners of LDA. |
Edith Zhang · David Blei 🔗 |
Author Information
Yarin Gal (University of Oxford)
Yingzhen Li (Imperial College London)
Yingzhen Li is a senior researcher at Microsoft Research Cambridge. She received her PhD from the University of Cambridge, and previously she has interned at Disney Research. She is passionate about building reliable machine learning systems, and her approach combines both Bayesian statistics and deep learning. Her contributions to the approximate inference field include: (1) algorithmic advances, such as variational inference with different divergences, combining variational inference with MCMC and approximate inference with implicit distributions; (2) applications of approximate inference, such as uncertainty estimation in Bayesian neural networks and algorithms to train deep generative models. She has served as area chairs at NeurIPS/ICML/ICLR/AISTATS on related research topics, and she is a co-organizer of the AABI2020 symposium, a flagship event of approximate inference.
Sebastian Farquhar (University of Oxford)
Christos Louizos (Qualcomm AI Research)
Eric Nalisnick (University of Amsterdam)
Andrew Gordon Wilson (New York University)
Zoubin Ghahramani (Uber and University of Cambridge)
Zoubin Ghahramani is Professor of Information Engineering at the University of Cambridge, where he leads the Machine Learning Group. He studied computer science and cognitive science at the University of Pennsylvania, obtained his PhD from MIT in 1995, and was a postdoctoral fellow at the University of Toronto. His academic career includes concurrent appointments as one of the founding members of the Gatsby Computational Neuroscience Unit in London, and as a faculty member of CMU's Machine Learning Department for over 10 years. His current research interests include statistical machine learning, Bayesian nonparametrics, scalable inference, probabilistic programming, and building an automatic statistician. He has held a number of leadership roles as programme and general chair of the leading international conferences in machine learning including: AISTATS (2005), ICML (2007, 2011), and NIPS (2013, 2014). In 2015 he was elected a Fellow of the Royal Society.
Kevin Murphy (Google)
Max Welling (University of Amsterdam / Qualcomm AI Research)
More from the Same Authors
-
2020 : Paper 40: Real2sim: Automatic Generation of Open Street Map Towns For Autonomous Driving Benchmarks »
Panagiotis Tigas · Yarin Gal -
2020 Meetup: MeetUp: Oxford, UK »
Yarin Gal -
2021 Spotlight: Speedy Performance Estimation for Neural Architecture Search »
Robin Ru · Clare Lyle · Lisa Schut · Miroslav Fil · Mark van der Wilk · Yarin Gal -
2021 : Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks »
Andrey Malinin · Neil Band · Yarin Gal · Mark Gales · Alexander Ganshin · German Chesnokov · Alexey Noskov · Andrey Ploskonosov · Liudmila Prokhorenkova · Ivan Provilkov · Vatsal Raina · Vyas Raina · Denis Roginskiy · Mariya Shmatova · Panagiotis Tigas · Boris Yangel -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : DeDUCE: Generating Counterfactual Explanations At Scale »
Benedikt Höltgen · Lisa Schut · Jan Brauner · Yarin Gal -
2021 : Robust Reinforcement Learning for Shifting Dynamics During Deployment »
Samuel Stanton · Rasool Fakoor · Jonas Mueller · Andrew Gordon Wilson · Alexander Smola -
2021 : Accurate Imputation and Efficient Data Acquisitionwith Transformer-based VAEs »
Sarah Lewis · Tatiana Matejovicova · Yingzhen Li · Angus Lamb · Yordan Zaykov · Miltiadis Allamanis · Cheng Zhang -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : Using Non-Linear Causal Models to Study Aerosol-Cloud Interactions in the Southeast Pacific »
Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan Watson-Parris · Yarin Gal · Philip Stier -
2021 : DARTS without a Validation Set: Optimizing the Marginal Likelihood »
Miroslav Fil · Robin Ru · Clare Lyle · Yarin Gal -
2021 : Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data »
Sindy Löwe · David Madras · Richard Zemel · Max Welling -
2021 : Using Non-Linear Causal Models to StudyAerosol-Cloud Interactions in the Southeast Pacific »
Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan Watson-Parris · Yarin Gal · Philip Stier -
2021 : Can Network Flatness Explain the Training Speed-Generalisation Connection? »
Albert Q. Jiang · Clare Lyle · Lisa Schut · Yarin Gal -
2021 : Decomposing Representations for Deterministic Uncertainty Estimation »
Haiwen Huang · Joost van Amersfoort · Yarin Gal -
2021 : On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty »
Joost van Amersfoort · Lewis Smith · Andrew Jesson · Oscar Key · Yarin Gal -
2021 : Contrastive Representation Learning with Trainable Augmentation Channel »
Masanori Koyama · Kentaro Minami · Takeru Miyato · Yarin Gal -
2021 : Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning »
Zachary Nado · Neil Band · Mark Collier · Josip Djolonga · Mike Dusenberry · Sebastian Farquhar · Qixuan Feng · Angelos Filos · Marton Havasi · Rodolphe Jenatton · Ghassen Jerfel · Jeremiah Liu · Zelda Mariet · Jeremy Nixon · Shreyas Padhy · Jie Ren · Tim G. J. Rudner · Yeming Wen · Florian Wenzel · Kevin Murphy · D. Sculley · Balaji Lakshminarayanan · Jasper Snoek · Yarin Gal · Dustin Tran -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : Accurate Imputation and Efficient Data Acquisitionwith Transformer-based VAEs »
Sarah Lewis · Tatiana Matejovicova · Yingzhen Li · Angus Lamb · Yordan Zaykov · Miltiadis Allamanis · Cheng Zhang -
2021 : Particle Dynamics for Learning EBMs »
Kirill Neklyudov · Priyank Jaini · Max Welling -
2022 Poster: Scalable Infomin Learning »
Yanzhi Chen · weihao sun · Yingzhen Li · Adrian Weller -
2022 : Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning »
Zeel B Patel · Nipun Batra · Kevin Murphy -
2022 : Discovering Long-period Exoplanets using Deep Learning with Citizen Science Labels »
Shreshth A Malik · Nora Eisner · Chris Lintott · Yarin Gal -
2022 : PIPS: Path Integral Stochastic Optimal Control for Path Sampling in Molecular Dynamics »
Lars Holdijk · Yuanqi Du · Ferry Hooft · Priyank Jaini · Berend Ensing · Max Welling -
2022 : Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design »
Ilia Igashov · Hannes Stärk · Clément Vignac · Victor Garcia Satorras · Pascal Frossard · Max Welling · Michael Bronstein · Bruno Correia -
2022 : Decentralized Learning with Random Walks and Communication-Efficient Adaptive Optimization »
Aleksei Triastcyn · Matthias Reisser · Christos Louizos -
2022 : Program Synthesis for Integer Sequence Generation »
Natasha Butt · Auke Wiggers · Taco Cohen · Max Welling -
2022 : Structure-based Drug Design with Equivariant Diffusion Models »
Arne Schneuing · Yuanqi Du · Charles Harris · Arian Jamasb · Ilia Igashov · weitao Du · Tom Blundell · Pietro Lió · Carla Gomes · Max Welling · Michael Bronstein · Bruno Correia -
2022 : Using uncertainty-aware machine learning models to study aerosol-cloud interactions »
Maëlys Solal · Andrew Jesson · Yarin Gal · Alyson Douglas -
2022 : TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction »
Pascal Notin · Lodevicus van Niekerk · Aaron Kollasch · Daniel Ritter · Yarin Gal · Debora Marks -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Learning Generative Models with Invariance to Symmetries »
James Allingham · Javier Antorán · Shreyas Padhy · Eric Nalisnick · José Miguel Hernández-Lobato -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : What 'Out-of-distribution' Is and Is Not »
Sebastian Farquhar · Yarin Gal -
2022 : Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation »
Lorenz Kuhn · Yarin Gal · Sebastian Farquhar -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : On Representation Learning Under Class Imbalance »
Ravid Shwartz-Ziv · Micah Goldblum · Yucen Li · C. Bayan Bruss · Andrew Gordon Wilson -
2022 Spotlight: Alleviating Adversarial Attacks on Variational Autoencoders with MCMC »
Anna Kuzina · Max Welling · Jakub Tomczak -
2022 Spotlight: Machine Learning on Graphs: A Model and Comprehensive Taxonomy »
Ines Chami · Sami Abu-El-Haija · Bryan Perozzi · Christopher Ré · Kevin Murphy -
2022 : Andrew Gordon Wilson: When Bayesian Orthodoxy Can Go Wrong: Model Selection and Out-of-Distribution Generalization »
Andrew Gordon Wilson -
2022 : Andrew Gordon Wilson: When Bayesian Orthodoxy Can Go Wrong: Model Selection and Out-of-Distribution Generalization »
Andrew Gordon Wilson -
2022 : Invited Speaker »
Max Welling -
2022 : Invited Talk #4, The Fifth Paradigm of Scientific Discovery, Max Welling »
Max Welling -
2022 : Poster session 1 »
Yingzhen Li -
2022 Workshop: NeurIPS 2022 Workshop on Score-Based Methods »
Yingzhen Li · Yang Song · Valentin De Bortoli · Francois-Xavier Briol · Wenbo Gong · Alexia Jolicoeur-Martineau · Arash Vahdat -
2022 Workshop: AI for Science: Progress and Promises »
Yi Ding · Yuanqi Du · Tianfan Fu · Hanchen Wang · Anima Anandkumar · Yoshua Bengio · Anthony Gitter · Carla Gomes · Aviv Regev · Max Welling · Marinka Zitnik -
2022 Poster: Batch Bayesian Optimization on Permutations using the Acquisition Weighted Kernel »
Changyong Oh · Roberto Bondesan · Efstratios Gavves · Max Welling -
2022 Poster: Tractable Function-Space Variational Inference in Bayesian Neural Networks »
Tim G. J. Rudner · Zonghao Chen · Yee Whye Teh · Yarin Gal -
2022 Poster: Repairing Neural Networks by Leaving the Right Past Behind »
Ryutaro Tanno · Melanie F. Pradier · Aditya Nori · Yingzhen Li -
2022 Poster: Alleviating Adversarial Attacks on Variational Autoencoders with MCMC »
Anna Kuzina · Max Welling · Jakub Tomczak -
2022 Poster: Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions »
Andrew Jesson · Alyson Douglas · Peter Manshausen · Maëlys Solal · Nicolai Meinshausen · Philip Stier · Yarin Gal · Uri Shalit -
2022 Poster: Interventions, Where and How? Experimental Design for Causal Models at Scale »
Panagiotis Tigas · Yashas Annadani · Andrew Jesson · Bernhard Schölkopf · Yarin Gal · Stefan Bauer -
2022 Poster: Machine Learning on Graphs: A Model and Comprehensive Taxonomy »
Ines Chami · Sami Abu-El-Haija · Bryan Perozzi · Christopher Ré · Kevin Murphy -
2022 Poster: On the symmetries of the synchronization problem in Cryo-EM: Multi-Frequency Vector Diffusion Maps on the Projective Plane »
Gabriele Cesa · Arash Behboodi · Taco Cohen · Max Welling -
2022 Poster: Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Thomas Rainforth -
2022 Poster: Learning Neural Set Functions Under the Optimal Subset Oracle »
Zijing Ou · Tingyang Xu · Qinliang Su · Yingzhen Li · Peilin Zhao · Yatao Bian -
2021 : Human-in-the-loop Bayesian Deep Learning »
Yarin Gal -
2021 : [S7] DeDUCE: Generating Counterfactual Explanations At Scale »
Benedikt Höltgen · Lisa Schut · Jan Brauner · Yarin Gal -
2021 : Particle Dynamics for Learning EBMs »
Kirill Neklyudov · Priyank Jaini · Max Welling -
2021 : General Discussion 1 - What is out of distribution (OOD) generalization and why is it important? with Yoshua Bengio, Leyla Isik, Max Welling »
Yoshua Bengio · Leyla Isik · Max Welling · Joshua T Vogelstein · Weiwei Yang -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders »
T. Anderson Keller · Qinghe Gao · Max Welling -
2021 : Live Panel »
Max Welling · Bharath Ramsundar · Irina Rish · Karianne J Bergen · Pushmeet Kohli -
2021 : Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders »
T. Anderson Keller · Qinghe Gao · Max Welling -
2021 : Session 1 | Invited talk: Max Welling, "Accelerating simulations of nature, both classical and quantum, with equivariant deep learning" »
Max Welling · Atilim Gunes Baydin -
2021 Workshop: AI for Science: Mind the Gaps »
Payal Chandak · Yuanqi Du · Tianfan Fu · Wenhao Gao · Kexin Huang · Shengchao Liu · Ziming Liu · Gabriel Spadon · Max Tegmark · Hanchen Wang · Adrian Weller · Max Welling · Marinka Zitnik -
2021 Poster: Speedy Performance Estimation for Neural Architecture Search »
Robin Ru · Clare Lyle · Lisa Schut · Miroslav Fil · Mark van der Wilk · Yarin Gal -
2021 Poster: Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions »
Emiel Hoogeboom · Didrik Nielsen · Priyank Jaini · Patrick Forré · Max Welling -
2021 Poster: Sparse Uncertainty Representation in Deep Learning with Inducing Weights »
Hippolyt Ritter · Martin Kukla · Cheng Zhang · Yingzhen Li -
2021 Poster: Topographic VAEs learn Equivariant Capsules »
T. Anderson Keller · Max Welling -
2021 : Evaluating Approximate Inference in Bayesian Deep Learning + Q&A »
Andrew Gordon Wilson · Pavel Izmailov · Matthew Hoffman · Yarin Gal · Yingzhen Li · Melanie F. Pradier · Sharad Vikram · Andrew Foong · Sanae Lotfi · Sebastian Farquhar -
2021 : Unsupervised Indoor Wi-Fi Positioning »
Farhad G. Zanjani · Ilia Karmanov · Hanno Ackermann · Daniel Dijkman · Max Welling · Ishaque Kadampot · Simone Merlin · Steve Shellhammer · Rui Liang · Brian Buesker · Harshit Joshi · Vamsi Vegunta · Raamkumar Balamurthi · Bibhu Mohanty · Joseph Soriaga · Ron Tindall · Pat Lawlor -
2021 Poster: Outcome-Driven Reinforcement Learning via Variational Inference »
Tim G. J. Rudner · Vitchyr Pong · Rowan McAllister · Yarin Gal · Sergey Levine -
2021 Poster: Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent »
Priyank Jaini · Lars Holdijk · Max Welling -
2021 Poster: E(n) Equivariant Normalizing Flows »
Victor Garcia Satorras · Emiel Hoogeboom · Fabian Fuchs · Ingmar Posner · Max Welling -
2021 Poster: Improving black-box optimization in VAE latent space using decoder uncertainty »
Pascal Notin · José Miguel Hernández-Lobato · Yarin Gal -
2021 Poster: Modality-Agnostic Topology Aware Localization »
Farhad Ghazvinian Zanjani · Ilia Karmanov · Hanno Ackermann · Daniel Dijkman · Simone Merlin · Max Welling · Fatih Porikli -
2021 Poster: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations »
Tim G. J. Rudner · Cong Lu · Michael A Osborne · Yarin Gal · Yee Teh -
2021 : Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift + Q&A »
Andrey Malinin · Neil Band · German Chesnokov · Yarin Gal · Alexander Ganshin · Mark Gales · Alexey Noskov · Liudmila Prokhorenkova · Mariya Shmatova · Vyas Raina · Vatsal Raina · Panagiotis Tigas · Boris Yangel -
2021 Poster: Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data »
Andrew Jesson · Panagiotis Tigas · Joost van Amersfoort · Andreas Kirsch · Uri Shalit · Yarin Gal -
2021 Poster: Domain Invariant Representation Learning with Domain Density Transformations »
A. Tuan Nguyen · Toan Tran · Yarin Gal · Atilim Gunes Baydin -
2021 Poster: Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning »
Jannik Kossen · Neil Band · Clare Lyle · Aidan Gomez · Thomas Rainforth · Yarin Gal -
2021 Poster: Deep Neural Networks as Point Estimates for Deep Gaussian Processes »
Vincent Dutordoir · James Hensman · Mark van der Wilk · Carl Henrik Ek · Zoubin Ghahramani · Nicolas Durrande -
2021 Oral: E(n) Equivariant Normalizing Flows »
Victor Garcia Satorras · Emiel Hoogeboom · Fabian Fuchs · Ingmar Posner · Max Welling -
2020 : Invited Talk: Max Welling - The LIAR (Learning with Interval Arithmetic Regularization) is Dead »
Max Welling -
2020 Poster: Natural Graph Networks »
Pim de Haan · Taco Cohen · Max Welling -
2020 Poster: SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks »
Fabian Fuchs · Daniel E Worrall · Volker Fischer · Max Welling -
2020 Poster: Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations »
Sebastian Farquhar · Lewis Smith · Yarin Gal -
2020 Poster: A Bayesian Perspective on Training Speed and Model Selection »
Clare Lyle · Lisa Schut · Robin Ru · Yarin Gal · Mark van der Wilk -
2020 Poster: On the Expressiveness of Approximate Inference in Bayesian Neural Networks »
Andrew Foong · David Burt · Yingzhen Li · Richard Turner -
2020 Poster: SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows »
Didrik Nielsen · Priyank Jaini · Emiel Hoogeboom · Ole Winther · Max Welling -
2020 Oral: SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows »
Didrik Nielsen · Priyank Jaini · Emiel Hoogeboom · Ole Winther · Max Welling -
2020 Poster: The Convolution Exponential and Generalized Sylvester Flows »
Emiel Hoogeboom · Victor Garcia Satorras · Jakub Tomczak · Max Welling -
2020 Poster: Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models »
Andrew Jesson · Sören Mindermann · Uri Shalit · Yarin Gal -
2020 Poster: Bayesian Bits: Unifying Quantization and Pruning »
Mart van Baalen · Christos Louizos · Markus Nagel · Rana Ali Amjad · Ying Wang · Tijmen Blankevoort · Max Welling -
2020 Poster: Experimental design for MRI by greedy policy search »
Tim Bakker · Herke van Hoof · Max Welling -
2020 Poster: How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? »
Mrinank Sharma · Sören Mindermann · Jan Brauner · Gavin Leech · Anna Stephenson · Tomáš Gavenčiak · Jan Kulveit · Yee Whye Teh · Leonid Chindelevitch · Yarin Gal -
2020 Spotlight: How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? »
Mrinank Sharma · Sören Mindermann · Jan Brauner · Gavin Leech · Anna Stephenson · Tomáš Gavenčiak · Jan Kulveit · Yee Whye Teh · Leonid Chindelevitch · Yarin Gal -
2020 Spotlight: Experimental design for MRI by greedy policy search »
Tim Bakker · Herke van Hoof · Max Welling -
2020 Poster: MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning »
Elise van der Pol · Daniel E Worrall · Herke van Hoof · Frans Oliehoek · Max Welling -
2020 Tutorial: (Track1) Advances in Approximate Inference »
Yingzhen Li · Cheng Zhang -
2019 : TBD »
Max Welling -
2019 : Poster session »
Sebastian Farquhar · Erik Daxberger · Andreas Look · Matt Benatan · Ruiyi Zhang · Marton Havasi · Fredrik Gustafsson · James A Brofos · Nabeel Seedat · Micha Livne · Ivan Ustyuzhaninov · Adam Cobb · Felix D McGregor · Patrick McClure · Tim R. Davidson · Gaurush Hiranandani · Sanjeev Arora · Masha Itkina · Didrik Nielsen · William Harvey · Matias Valdenegro-Toro · Stefano Peluchetti · Riccardo Moriconi · Tianyu Cui · Vaclav Smidl · Taylan Cemgil · Jack Fitzsimons · He Zhao · · mariana vargas vieyra · Apratim Bhattacharyya · Rahul Sharma · Geoffroy Dubourg-Felonneau · Jonathan Warrell · Slava Voloshynovskiy · Mihaela Rosca · Jiaming Song · Andrew Ross · Homa Fashandi · Ruiqi Gao · Hooshmand Shokri Razaghi · Joshua Chang · Zhenzhong Xiao · Vanessa Boehm · Giorgio Giannone · Ranganath Krishnan · Joe Davison · Arsenii Ashukha · Jeremiah Liu · Sicong (Sheldon) Huang · Evgenii Nikishin · Sunho Park · Nilesh Ahuja · Mahesh Subedar · · Artyom Gadetsky · Jhosimar Arias Figueroa · Tim G. J. Rudner · Waseem Aslam · Adrián Csiszárik · John Moberg · Ali Hebbal · Kathrin Grosse · Pekka Marttinen · Bang An · Hlynur Jónsson · Samuel Kessler · Abhishek Kumar · Mikhail Figurnov · Omesh Tickoo · Steindor Saemundsson · Ari Heljakka · Dániel Varga · Niklas Heim · Simone Rossi · Max Laves · Waseem Gharbieh · Nicholas Roberts · Luis Armando Pérez Rey · Matthew Willetts · Prithvijit Chakrabarty · Sumedh Ghaisas · Carl Shneider · Wray Buntine · Kamil Adamczewski · Xavier Gitiaux · Suwen Lin · Hao Fu · Gunnar Rätsch · Aidan Gomez · Erik Bodin · Dinh Phung · Lennart Svensson · Juliano Tusi Amaral Laganá Pinto · Milad Alizadeh · Jianzhun Du · Kevin Murphy · Beatrix Benkő · Shashaank Vattikuti · Jonathan Gordon · Christopher Kanan · Sontje Ihler · Darin Graham · Michael Teng · Louis Kirsch · Tomas Pevny · Taras Holotyak -
2019 : Keynote - ML »
Max Welling -
2019 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Eric Nalisnick · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2019 Poster: Invert to Learn to Invert »
Patrick Putzky · Max Welling -
2019 Poster: BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning »
Andreas Kirsch · Joost van Amersfoort · Yarin Gal -
2019 Poster: Deep Scale-spaces: Equivariance Over Scale »
Daniel Worrall · Max Welling -
2019 Poster: Integer Discrete Flows and Lossless Compression »
Emiel Hoogeboom · Jorn Peters · Rianne van den Berg · Max Welling -
2019 Poster: Exact Gaussian Processes on a Million Data Points »
Ke Alexander Wang · Geoff Pleiss · Jacob Gardner · Stephen Tyree · Kilian Weinberger · Andrew Gordon Wilson -
2019 Poster: Function-Space Distributions over Kernels »
Gregory Benton · Wesley Maddox · Jayson Salkey · Julio Albinati · Andrew Gordon Wilson -
2019 Poster: Bayesian Learning of Sum-Product Networks »
Martin Trapp · Robert Peharz · Hong Ge · Franz Pernkopf · Zoubin Ghahramani -
2019 Poster: The Functional Neural Process »
Christos Louizos · Xiahan Shi · Klamer Schutte · Max Welling -
2019 Poster: Language as an Abstraction for Hierarchical Deep Reinforcement Learning »
YiDing Jiang · Shixiang (Shane) Gu · Kevin Murphy · Chelsea Finn -
2019 Poster: Combining Generative and Discriminative Models for Hybrid Inference »
Victor Garcia Satorras · Zeynep Akata · Max Welling -
2019 Spotlight: Combining Generative and Discriminative Models for Hybrid Inference »
Victor Garcia Satorras · Max Welling · Zeynep Akata -
2019 Poster: Unsupervised learning of object structure and dynamics from videos »
Matthias Minderer · Chen Sun · Ruben Villegas · Forrester Cole · Kevin Murphy · Honglak Lee -
2019 Poster: Combinatorial Bayesian Optimization using the Graph Cartesian Product »
Changyong Oh · Jakub Tomczak · Stratis Gavves · Max Welling -
2019 Poster: A Simple Baseline for Bayesian Uncertainty in Deep Learning »
Wesley Maddox · Pavel Izmailov · Timur Garipov · Dmitry Vetrov · Andrew Gordon Wilson -
2018 : Making the Case for using more Inductive Bias in Deep Learning »
Max Welling -
2018 : Panel disucssion »
Max Welling · Tim Genewein · Edwin Park · Song Han -
2018 : TBC 15 »
Yarin Gal -
2018 : Invited Speaker #5 Yarin Gal »
Yarin Gal -
2018 : Efficient Computation of Deep Convolutional Neural Networks: A Quantization Perspective »
Max Welling -
2018 : Prof. Max Welling »
Max Welling -
2018 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2018 Workshop: NIPS 2018 workshop on Compact Deep Neural Networks with industrial applications »
Lixin Fan · Zhouchen Lin · Max Welling · Yurong Chen · Werner Bailer -
2018 : Opening Remarks »
Yarin Gal -
2018 Poster: BRUNO: A Deep Recurrent Model for Exchangeable Data »
Iryna Korshunova · Jonas Degrave · Ferenc Huszar · Yarin Gal · Arthur Gretton · Joni Dambre -
2018 Poster: MetaGAN: An Adversarial Approach to Few-Shot Learning »
Ruixiang ZHANG · Tong Che · Zoubin Ghahramani · Yoshua Bengio · Yangqiu Song -
2018 Poster: Graphical Generative Adversarial Networks »
Chongxuan LI · Max Welling · Jun Zhu · Bo Zhang -
2018 Poster: 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data »
Maurice Weiler · Wouter Boomsma · Mario Geiger · Max Welling · Taco Cohen -
2017 : Panel Session »
Neil Lawrence · Finale Doshi-Velez · Zoubin Ghahramani · Yann LeCun · Max Welling · Yee Whye Teh · Ole Winther -
2017 : Deep Bayes for Distributed Learning, Uncertainty Quantification and Compression »
Max Welling -
2017 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Andrew Wilson · Diederik Kingma · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2017 : Panel session »
Iain Murray · Max Welling · Juan Carrasquilla · Anatole von Lilienfeld · Gilles Louppe · Kyle Cranmer -
2017 : Panel: On the Foundations and Future of Approximate Inference »
David Blei · Zoubin Ghahramani · Katherine Heller · Tim Salimans · Max Welling · Matthew D. Hoffman -
2017 : Invited talk 1: Deep recurrent inverse modeling for radio astronomy and fast MRI imaging »
Max Welling -
2017 Workshop: Advances in Approximate Bayesian Inference »
Francisco Ruiz · Stephan Mandt · Cheng Zhang · James McInerney · James McInerney · Dustin Tran · Dustin Tran · David Blei · Max Welling · Tamara Broderick · Michalis Titsias -
2017 : Panel: "Should we prioritize research on human-like AI or something different?" »
Cynthia Dwork · David Runciman · Zoubin Ghahramani -
2017 Symposium: Kinds of intelligence: types, tests and meeting the needs of society »
José Hernández-Orallo · Zoubin Ghahramani · Tomaso Poggio · Adrian Weller · Matthew Crosby -
2017 Poster: Concrete Dropout »
Yarin Gal · Jiri Hron · Alex Kendall -
2017 Poster: Causal Effect Inference with Deep Latent-Variable Models »
Christos Louizos · Uri Shalit · Joris Mooij · David Sontag · Richard Zemel · Max Welling -
2017 Poster: What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? »
Alex Kendall · Yarin Gal -
2017 Spotlight: What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? »
Alex Kendall · Yarin Gal -
2017 Poster: Bayesian Compression for Deep Learning »
Christos Louizos · Karen Ullrich · Max Welling -
2017 Poster: Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning »
Shixiang (Shane) Gu · Timothy Lillicrap · Richard Turner · Zoubin Ghahramani · Bernhard Schölkopf · Sergey Levine -
2017 Poster: Real Time Image Saliency for Black Box Classifiers »
Piotr Dabkowski · Yarin Gal -
2016 : Panel Discussion »
Shakir Mohamed · David Blei · Ryan Adams · José Miguel Hernández-Lobato · Ian Goodfellow · Yarin Gal -
2016 : Max Welling : Making Deep Learning Efficient Through Sparsification »
Max Welling -
2016 : Automatic Discovery of the Statistical Types of Variables in a Dataset »
Isabel Valera · Zoubin Ghahramani -
2016 : History of Bayesian neural networks »
Zoubin Ghahramani -
2016 Workshop: Bayesian Deep Learning »
Yarin Gal · Christos Louizos · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2016 Workshop: Towards an Artificial Intelligence for Data Science »
Charles Sutton · James Geddes · Zoubin Ghahramani · Padhraic Smyth · Chris Williams -
2016 : How Machine Learning Research Can Address Key Societal and Governance Issues »
Zoubin Ghahramani -
2016 Workshop: People and machines: Public views on machine learning, and what this means for machine learning researchers »
Susannah Odell · Peter Donnelly · Jessica Montgomery · Sabine Hauert · Zoubin Ghahramani · Katherine Gorman -
2016 Workshop: Advances in Approximate Bayesian Inference »
Tamara Broderick · Stephan Mandt · James McInerney · Dustin Tran · David Blei · Kevin Murphy · Andrew Gelman · Michael I Jordan -
2016 Poster: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks »
Yarin Gal · Zoubin Ghahramani -
2016 Poster: Improving Variational Autoencoders with Inverse Autoregressive Flow »
Diederik Kingma · Tim Salimans · Rafal Jozefowicz · Peter Chen · Xi Chen · Ilya Sutskever · Max Welling -
2016 Poster: Distributed Flexible Nonlinear Tensor Factorization »
Shandian Zhe · Kai Zhang · Pengyuan Wang · Kuang-chih Lee · Zenglin Xu · Yuan Qi · Zoubin Ghahramani -
2015 : Bayesian Optimization »
Zoubin Ghahramani · Bobak Shahriari -
2015 Workshop: Black box learning and inference »
Josh Tenenbaum · Jan-Willem van de Meent · Tejas Kulkarni · S. M. Ali Eslami · Brooks Paige · Frank Wood · Zoubin Ghahramani -
2015 Workshop: Scalable Monte Carlo Methods for Bayesian Analysis of Big Data »
Babak Shahbaba · Yee Whye Teh · Max Welling · Arnaud Doucet · Christophe Andrieu · Sebastian J. Vollmer · Pierre Jacob -
2015 : *Max Welling* Optimization Monte Carlo »
Max Welling -
2015 Symposium: Deep Learning Symposium »
Yoshua Bengio · Marc'Aurelio Ranzato · Honglak Lee · Max Welling · Andrew Y Ng -
2015 Poster: Particle Gibbs for Infinite Hidden Markov Models »
Nilesh Tripuraneni · Shixiang (Shane) Gu · Hong Ge · Zoubin Ghahramani -
2015 Poster: Neural Adaptive Sequential Monte Carlo »
Shixiang (Shane) Gu · Zoubin Ghahramani · Richard Turner -
2015 Poster: Bayesian dark knowledge »
Anoop Korattikara Balan · Vivek Rathod · Kevin Murphy · Max Welling -
2015 Poster: MCMC for Variationally Sparse Gaussian Processes »
James Hensman · Alexander Matthews · Maurizio Filippone · Zoubin Ghahramani -
2015 Poster: Optimization Monte Carlo: Efficient and Embarrassingly Parallel Likelihood-Free Inference »
Ted Meeds · Max Welling -
2015 Poster: Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions »
Amar Shah · Zoubin Ghahramani -
2015 Invited Talk: Probabilistic Machine Learning: Foundations and Frontiers »
Zoubin Ghahramani -
2015 Poster: Statistical Model Criticism using Kernel Two Sample Tests »
James R Lloyd · Zoubin Ghahramani -
2015 Poster: Variational Dropout and the Local Reparameterization Trick »
Diederik Kingma · Tim Salimans · Max Welling -
2014 Workshop: Bayesian Optimization in Academia and Industry »
Zoubin Ghahramani · Ryan Adams · Matthew Hoffman · Kevin Swersky · Jasper Snoek -
2014 Workshop: ABC in Montreal »
Max Welling · Neil D Lawrence · Richard D Wilkinson · Ted Meeds · Christian X Robert -
2014 Poster: Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models »
Yarin Gal · Mark van der Wilk · Carl Edward Rasmussen -
2014 Poster: Predictive Entropy Search for Efficient Global Optimization of Black-box Functions »
José Miguel Hernández-Lobato · Matthew Hoffman · Zoubin Ghahramani -
2014 Poster: Semi-supervised Learning with Deep Generative Models »
Diederik Kingma · Shakir Mohamed · Danilo Jimenez Rezende · Max Welling -
2014 Poster: Gaussian Process Volatility Model »
Yue Wu · José Miguel Hernández-Lobato · Zoubin Ghahramani -
2014 Demonstration: Machine Learning in the Browser »
Ted Meeds · Remco Hendriks · Said Al Faraby · Magiel Bruntink · Max Welling -
2014 Spotlight: Semi-supervised Learning with Deep Generative Models »
Diederik Kingma · Shakir Mohamed · Danilo Jimenez Rezende · Max Welling -
2014 Spotlight: Predictive Entropy Search for Efficient Global Optimization of Black-box Functions »
José Miguel Hernández-Lobato · Matthew Hoffman · Zoubin Ghahramani -
2014 Poster: General Table Completion using a Bayesian Nonparametric Model »
Isabel Valera · Zoubin Ghahramani -
2013 Workshop: Probabilistic Models for Big Data »
Neil D Lawrence · Joaquin Quiñonero-Candela · Tianshi Gao · James Hensman · Zoubin Ghahramani · Max Welling · David Blei · Ralf Herbrich -
2013 Session: Oral Session 5 »
Zoubin Ghahramani -
2012 Poster: Collaborative Gaussian Processes for Preference Learning »
Neil Houlsby · José Miguel Hernández-Lobato · Ferenc Huszar · Zoubin Ghahramani -
2012 Poster: A nonparametric variable clustering model »
David A Knowles · Konstantina Palla · Zoubin Ghahramani -
2012 Poster: Random function priors for exchangeable graphs and arrays »
James R Lloyd · Daniel Roy · Peter Orbanz · Zoubin Ghahramani -
2012 Poster: Active Learning of Model Evidence Using Bayesian Quadrature »
Michael A Osborne · David Duvenaud · Roman Garnett · Carl Edward Rasmussen · Stephen J Roberts · Zoubin Ghahramani -
2012 Poster: Continuous Relaxations for Discrete Hamiltonian Monte Carlo »
Zoubin Ghahramani · Yichuan Zhang · Charles Sutton · Amos Storkey -
2012 Spotlight: Continuous Relaxations for Discrete Hamiltonian Monte Carlo »
Zoubin Ghahramani · Yichuan Zhang · Charles Sutton · Amos Storkey -
2012 Poster: The Time-Marginalized Coalescent Prior for Hierarchical Clustering »
Levi Boyles · Max Welling -
2011 Workshop: Copulas in Machine Learning »
Gal Elidan · Zoubin Ghahramani · John Lafferty -
2011 Poster: Testing a Bayesian Measure of Representativeness Using a Large Image Database »
Joshua T Abbott · Katherine Heller · Zoubin Ghahramani · Tom Griffiths -
2011 Poster: Statistical Tests for Optimization Efficiency »
Levi Boyles · Anoop Korattikara · Deva Ramanan · Max Welling -
2010 Workshop: Transfer Learning Via Rich Generative Models. »
Russ Salakhutdinov · Ryan Adams · Josh Tenenbaum · Zoubin Ghahramani · Tom Griffiths -
2010 Talk: Unifying Views in Unsupervised Learning »
Zoubin Ghahramani -
2010 Oral: Tree-Structured Stick Breaking for Hierarchical Data »
Ryan Adams · Zoubin Ghahramani · Michael Jordan -
2010 Poster: Tree-Structured Stick Breaking for Hierarchical Data »
Ryan Adams · Zoubin Ghahramani · Michael Jordan -
2010 Poster: On Herding and the Perceptron Cycling Theorem »
Andrew E Gelfand · Yutian Chen · Laurens van der Maaten · Max Welling -
2010 Spotlight: Copula Processes »
Andrew Wilson · Zoubin Ghahramani -
2010 Poster: Copula Processes »
Andrew Wilson · Zoubin Ghahramani -
2009 Workshop: Nonparametric Bayes »
Dilan Gorur · Francois Caron · Yee Whye Teh · David B Dunson · Zoubin Ghahramani · Michael Jordan -
2009 Poster: Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process »
Shakir Mohamed · David A Knowles · Zoubin Ghahramani · Finale P Doshi-Velez -
2008 Session: Oral session 10: Nonparametric Processes, Scene Processing and Image Statistics »
Max Welling -
2008 Poster: The Infinite Factorial Hidden Markov Model »
Jurgen Van Gael · Yee Whye Teh · Zoubin Ghahramani -
2008 Poster: Bayesian Exponential Family PCA »
Shakir Mohamed · Katherine Heller · Zoubin Ghahramani -
2008 Poster: Asynchronous Distributed Learning of Topic Models »
Arthur Asuncion · Padhraic Smyth · Max Welling -
2008 Spotlight: Bayesian Exponential Family PCA »
Shakir Mohamed · Katherine Heller · Zoubin Ghahramani -
2008 Spotlight: The Infinite Factorial Hidden Markov Model »
Jurgen Van Gael · Yee Whye Teh · Zoubin Ghahramani -
2007 Spotlight: Collapsed Variational Inference for HDP »
Yee Whye Teh · Kenichi Kurihara · Max Welling -
2007 Spotlight: Distributed Inference for Latent Dirichlet Allocation »
David Newman · Arthur Asuncion · Padhraic Smyth · Max Welling -
2007 Poster: Infinite State Bayes-Nets for Structured Domains »
Max Welling · Ian Porteous · Evgeniy Bart -
2007 Poster: Hidden Common Cause Relations in Relational Learning »
Ricardo Silva · Wei Chu · Zoubin Ghahramani -
2007 Poster: Collapsed Variational Inference for HDP »
Yee Whye Teh · Kenichi Kurihara · Max Welling -
2007 Poster: Distributed Inference for Latent Dirichlet Allocation »
David Newman · Arthur Asuncion · Padhraic Smyth · Max Welling -
2007 Spotlight: Infinite State Bayes-Nets for Structured Domains »
Max Welling · Ian Porteous · Evgeniy Bart -
2007 Spotlight: Hidden Common Cause Relations in Relational Learning »
Ricardo Silva · Wei Chu · Zoubin Ghahramani -
2006 Poster: Relational Learning with Gaussian Processes »
Wei Chu · Vikas Sindhwani · Zoubin Ghahramani · Sathiya Selvaraj Keerthi -
2006 Poster: Structure Learning in Markov Random Fields »
Sridevi Parise · Max Welling -
2006 Poster: Accelerated Variational Dirichlet Process Mixtures »
Kenichi Kurihara · Max Welling · Nikos Vlassis -
2006 Poster: Modeling Dyadic Data with Binary Latent Features »
Ted Meeds · Zoubin Ghahramani · Radford M Neal · Sam T Roweis -
2006 Spotlight: Accelerated Variational Dirichlet Process Mixtures »
Kenichi Kurihara · Max Welling · Nikos Vlassis -
2006 Spotlight: Modeling Dyadic Data with Binary Latent Features »
Ted Meeds · Zoubin Ghahramani · Radford M Neal · Sam T Roweis -
2006 Poster: A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation »
Yee Whye Teh · David Newman · Max Welling