Timezone: »

Workshop
Deep Generative Models and Downstream Applications
José Miguel Hernández-Lobato · Yingzhen Li · Yichuan Zhang · Cheng Zhang · Austin Tripp · Weiwei Pan · Oren Rippel

Tue Dec 14 06:00 AM -- 03:00 PM (PST) @

Deep generative models (DGMs) have become an important research branch in deep learning, including a broad family of methods such as variational autoencoders, generative adversarial networks, normalizing flows, energy based models and autoregressive models. Many of these methods have been shown to achieve state-of-the-art results in the generation of synthetic data of different types such as text, speech, images, music, molecules, etc. However, besides just generating synthetic data, DGMs are of particular relevance in many practical downstream applications. A few examples are imputation and acquisition of missing data, anomaly detection, data denoising, compressed sensing, data compression, image super-resolution, molecule optimization, interpretation of machine learning methods, identifying causal structures in data, generation of molecular structures, etc. However, at present, there seems to be a disconnection between researchers working on new DGM-based methods and researchers applying such methods to practical problems (like the ones mentioned above). This workshop aims to fill in this gap by bringing the two aforementioned communities together.

 Tue 6:00 a.m. - 6:10 a.m. Opening remarks (Presentation) 🔗 Tue 6:10 a.m. - 6:25 a.m. Invited talk #1: Aapo Hyvärinen (Presentation) Aapo Hyvarinen 🔗 Tue 6:25 a.m. - 6:30 a.m. Q&A Invited Talk #1 (Q&A) 🔗 Tue 6:30 a.m. - 6:45 a.m. Invited talk #2: Finale Doshi-Velez (Presentation) Finale Doshi-Velez 🔗 Tue 6:45 a.m. - 6:50 a.m. Q&A Invited Talk #2 (Q&A) 🔗 Tue 6:50 a.m. - 7:05 a.m. Invited Talk #3: Rianne van den Berg (Presentation) Rianne van den Berg 🔗 Tue 7:05 a.m. - 7:10 a.m. Q&A Invited Talk #3 (Q&A) 🔗 Tue 7:10 a.m. - 7:20 a.m. Particle Dynamics for Learning EBMs (Oral)  link »    Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such sampling paradigms run MCMC targeting the current model, which requires infinitely long chains to generate samples from the true energy distribution and is problematic in practice. This paper proposes an alternative approach to getting these samples and avoiding crude MCMC sampling from the current model. We accomplish this by viewing the evolution of the modeling distribution as (i) the evolution of the energy function, and (ii) the evolution of the samples from this distribution along some vector field. We subsequently derive this time-dependent vector field such that the particles following this field are approximately distributed as the current density model. Thereby we match the evolution of the particles with the evolution of the energy function prescribed by the learning procedure. Importantly, unlike Monte Carlo sampling, our method targets to match the current distribution in a finite time. Finally, we demonstrate its effectiveness empirically comparing to MCMC-based learning methods. Link » Kirill Neklyudov · Priyank Jaini · Max Welling 🔗 Tue 7:20 a.m. - 7:30 a.m. VAEs meet Diffusion Models: Efficient and High-Fidelity Generation (Oral)  link »    Diffusion Probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, Variational Autoencoders (VAEs) have access to a low-dimensional latent space but, despite recent advances, exhibit poor sample quality. We present VAEDM, a novel generative framework for \textit{refining} VAE generated samples using diffusion models while also presenting a novel conditional forward process parameterization for diffusion models. We show that the resulting parameterization can improve upon the unconditional diffusion model in terms of sampling efficiency during inference while also equipping diffusion models with the low-dimensional VAE inferred latent code. Furthermore, we show that the proposed model exhibits out-of-the-box capabilities for downstream tasks like image superresolution and denoising. Link » Kushagra Pandey · Avideep Mukherjee · Piyush Rai · Abhishek Kumar 🔗 Tue 7:30 a.m. - 7:35 a.m. Contributed poster talk #1-2 Q&A (Q&A) 🔗 Tue 7:35 a.m. - 8:00 a.m. Break #1 (Break) 🔗 Tue 8:00 a.m. - 8:15 a.m. Invited talk #4: Chris Williams (Presentation) Chris Williams 🔗 Tue 8:15 a.m. - 8:20 a.m. Q&A Invited Talk #4 (Q&A) 🔗 Tue 8:20 a.m. - 8:35 a.m. Invited talk #5: Mihaela van der Schaar (Presentation) Mihaela van der Schaar 🔗 Tue 8:35 a.m. - 8:40 a.m. Q&A Invited Talk #5 (Q&A) 🔗 Tue 8:40 a.m. - 8:55 a.m. Invited Talk #6: Luisa Zintgraf (Presentation) Luisa Zintgraf 🔗 Tue 8:55 a.m. - 9:00 a.m. Q&A Invited Talk #6 (Q&A) 🔗 Tue 9:00 a.m. - 9:10 a.m. Your Dataset is a Multiset and You Should Compress it Like One (Oral)  link »    Neural Compressors (NCs) are codecs that leverage neural networks and entropy coding to achieve competitive compression performance for images, audio, and other data types. These compressors exploit parallel hardware, and are particularly well suited to compressing i.i.d. batches of data. The average number of bits needed to represent each example is at least the well-known cross-entropy. However, the cross-entropy bound assumes the order of the compressed examples in a batch is preserved, which in many applications is not necessary. The number of bits used to implicitly store the order information is the logarithm of the number of unique permutations of the dataset. In this work, we present a method that reduces the bitrate of any codec by exactly the number of bits needed to store the order, at the expense of shuffling the dataset in the process. Conceptually, our method applies bits-back coding to a latent variable model with observed symbol counts (i.e. multiset) and a latent permutation defining the ordering, and does not require retraining any models. We present experiments with both lossy off-the-shelf codecs (WebP) as well as lossless NCs. On Binarized MNIST, lossless NCs achieved savings of up to $7.6\%$, while adding only $10\%$ extra compute time. Link » Daniel Severo · James Townsend · Ashish Khisti · Alireza Makhzani · Karen Ullrich 🔗 Tue 9:10 a.m. - 9:20 a.m. Contributed poster talk #3 Q&A + Best paper awards (Q&A) 🔗 Tue 9:20 a.m. - 10:00 a.m. Break #2 (Break) 🔗 Tue 10:00 a.m. - 11:00 a.m. Poster session #1 (poster session (gathertown)) 🔗 Tue 11:00 a.m. - 11:30 a.m. Panel Discussion (Discussion Panel) 🔗 Tue 11:30 a.m. - 11:45 a.m. Invited Talk #7: Romain Lopez (Presentation) Romain Lopez 🔗 Tue 11:45 a.m. - 11:50 a.m. Q&A Invited Talk #7 (Q&A) 🔗 Tue 11:50 a.m. - 12:10 p.m. Break #3 (Break) 🔗 Tue 12:10 p.m. - 12:25 p.m. Invited talk #8: Alex Anderson (Presentation) Alex Anderson 🔗 Tue 12:25 p.m. - 12:30 p.m. Q&A Invited Talk #8 (Q&A) 🔗 Tue 12:30 p.m. - 12:40 p.m. AGE: Enhancing the Convergence on GANs using Alternating extra-gradient with Gradient Extrapolation (Oral)  link »    Generative adversarial networks (GANs) are notably difficult to train since the parameters can get stuck in a local optimum. As a result, methods often suffer not only from degeneration of the convergence speed but also from limitations in the representational power of the trained network. Existing optimization methods to stabilize convergence require multiple gradient computations per iteration. We propose AGE, an alternating extra-gradient method with nonlinear gradient extrapolation, that overcomes these computational inefficiencies and exhibits better convergence properties. It estimates the lookahead step using a nonlinear mixing of past gradient sequences. Empirical results on CIFAR10, CelebA, and several synthetic datasets demonstrate that the introduced approach significantly improves convergence and yields better generative models. Link » Huan He · Shifan Zhao · Yuanzhe Xi · Joyce Ho 🔗 Tue 12:40 p.m. - 12:50 p.m. Sample-Efficient Generation of Novel Photo-acid Generator Molecules using a Deep Generative Model (Oral)  link »    Photo-acid generators (PAGs) are compounds that release acids ($H^+$ ions) when exposed to light. These compounds are critical components of the photolithography processes that are used in the manufacture of semiconductor logic and memory chips. The exponential increase in the demand for semiconductors has highlighted the need for discovering novel photo-acid generators. While de novo molecule design using deep generative models has been widely employed for drug discovery and material design, its application to the creation of novel photo-acid generators poses several unique challenges, such as lack of property labels. In this paper, we highlight these challenges and propose a generative modeling approach that utilizes conditional generation from a pre-trained deep autoencoder and expert-in-the-loop techniques. The validity of the proposed approach was evaluated with the help of subject matter experts, indicating the promise of such an approach for applications beyond the creation of novel photo-acid generators. Link » Samuel Hoffman · Vijil Chenthamarakshan · Dmitry Zubarev · Daniel Sanders · Payel Das 🔗 Tue 12:50 p.m. - 12:55 p.m. Contributed poster talk #5-6 Q&A (Q&A) 🔗 Tue 12:55 p.m. - 1:10 p.m. Invited talk #9: Zhifeng Kong (Presentation) Zhifeng Kong 🔗 Tue 1:10 p.m. - 1:15 p.m. Q&A Invited Talk #9 (Q&A) 🔗 Tue 1:15 p.m. - 1:30 p.m. Invited talk #10: Johannes Ballé (Presentation) Johannes Ballé 🔗 Tue 1:30 p.m. - 1:35 p.m. Q&A Invited Talk #10 (Q&A) 🔗 Tue 1:35 p.m. - 1:45 p.m. Bayesian Image Reconstruction using Deep Generative Models (Oral)  link »    Machine learning models are commonly trained end-to-end and in a supervised setting, using paired (input, output) data. Examples include recent super-resolution methods that train on pairs of (low-resolution, high-resolution) images. However, these end-to-end approaches require re-training every time there is a distribution shift in the inputs (e.g., night images vs daylight) or relevant latent variables (e.g., camera blur or hand motion). In this work, we leverage state-of-the-art (SOTA) generative models (here StyleGAN2) for building powerful image priors, which enable application of Bayes' theorem for many downstream reconstruction tasks. Our method, "Bayesian Reconstruction through Generative Models" (BRGM), uses a single pre-trained generator model to solve different image restoration tasks, i.e., super-resolution and in-painting, by combining it with different forward corruption models. We keep the weights of the generator model fixed, and reconstruct the image by estimating the Bayesian maximum a-posteriori (MAP) estimate over the input latent vector that generated the reconstructed image. We further use Variational Inference to approximate the posterior distribution over the latent vectors, from which we sample multiple solutions. We demonstrate BRGM on three large and diverse datasets: (i) 60,000 images from the Flick Faces High Quality dataset (ii) 240,000 chest X-rays from MIMIC III and (iii) a combined collection of 5 brain MRI datasets with 7,329 scans. Across all three datasets and without any dataset-specific hyperparameter tuning, our simple approach yields performance competitive with current task-specific state-of-the-art methods on super-resolution and in-painting, while being more generalisable and without requiring any training. Our source code and pre-trained models are available online: https://razvanmarinescu.github.io/brgm/ Link » Razvan Marinescu · Daniel Moyer · Polina Golland 🔗 Tue 1:45 p.m. - 1:55 p.m. Grapher: Multi-Stage Knowledge Graph Construction using Pretrained Language Models (Oral)  link »    In this work we address the problem of Knowledge Graph (KG) construction from text, proposing a novel end-to-end multi-stage Grapher system, that separates the overall generation process into two stages. The graph nodes are generated first using pretrained language model, followed by a simple edge construction head, enabling efficient KG extraction from the textual descriptions. For each stage we proposed several architectural choices that can be used depending on the available training resources. We evaluated the Grapher on a recent WebNLG 2020 Challenge dataset, achieving competitive results on text-to-RDF generation task, as well as on a recent large-scale TekGen dataset, showing strong overall performance. We believe that the proposed Grapher system can serve as a viable KG construction alternative to the existing linearization or sampling-based graph generation approaches. Link » Igor Melnyk · Pierre Dognin · Payel Das 🔗 Tue 1:55 p.m. - 2:00 p.m. Contributed poster talk #7-8 Q&A (Q&A) 🔗 Tue 2:00 p.m. - 3:00 p.m. Poster session #2 (poster session (gathertown)) 🔗 - Transparent Liquid Segmentation for Robotic Pouring (Poster)  link » Liquid state estimation is important for robotics tasks such as pouring; however, estimating the state of transparent liquids is a challenging problem. We propose a novel segmentation pipeline that can segment transparent liquids such as water from a static, RGB image without requiring any manual annotations or heating of the liquid for training. Instead, we use a generative model that is capable of translating unpaired images of colored liquids into synthetically generated transparent liquid images. Segmentation labels of colored liquids are obtained automatically using background subtraction. We use paired samples of synthetically generated transparent liquid images and background subtraction for our segmentation pipeline. Our experiments show that we are able to accurately predict a segmentation mask for transparent liquids without requiring any manual annotations. We demonstrate the utility of transparent liquid segmentation in a robotic pouring task that controls pouring by perceiving liquid height in a transparent cup. Accompanying video and supplementary information can be found at https://sites.google.com/view/roboticliquidpouring Link » Gautham Narayan Narasimhan · Kai Zhang · Benjamin Eisner · Xingyu Lin · David Held 🔗 - Uncertainty-aware Labelled Augmentations for High Dimensional Latent Space Bayesian Optimization (Poster)  link » Black-box optimization problems are ubiquitous and of importance in many critical areas of science and engineering. Bayesian optimisation (BO) over the past years has emerged as one of the most successful techniques for optimising expensive black-box objectives. However, efficient scaling of BO to high-dimensional settings has proven to be extremely challenging. Traditional strategies based on projecting high-dimensional input data to a lower-dimensional manifold, such as Variational autoencoders (VAE) and Generative adversarial networks (GAN) have improved BO performance in high-dimensional regimes, but their dependence on excessive labeled input data has been widely reported. In this work, we target the data-greedy nature of deep generative models by constructing uncertainty-aware task-specific labeled data augmentations using Gaussian processes (GPs). Our approach outperforms existing state-of-the-art methods on machine learning tasks and demonstrates more informative data representation with limited supervision. Link » Ekansh Verma · Souradip Chakraborty 🔗 - How to Reward Your Drug Agent? (Poster)  link » Constructing novel molecules from scratch using deep generative models provides useful alternative to traditional virtual screening methods which are limited to the search of the already discovered chemicals. In particular, molecular optimisation combined with sampling guided by reinforcement learning seems like a promising path for discovering novel molecular designs and allows for domain-specific customization of the desired solutions. The choice of a chemically relevant reward function and the exhaustive assessment of its properties remains a challenging task. We introduce the reward function which gives enough flexibility to quantify the biological activity with respect to a selected protein target, drug-likeness, synthesizability and incorporates the custom index of penalised physico-chemical properties. In order to customise the hyper-parameters influencing the RL agent performance, wepropose the methodology which helps to quantify the chemical relevance of the reward function by quantifying the chemical relevance of the samples. We assess the performance of the reward function by docking the molecules with relevant protein targets and quantify the difference with the ground truth samples using Wasserstein distance. Link » Andrea Karlova · Wim Dehaen · Andrei Penciu 🔗 - Searching for the Weirdest Stars: A Convolutional Autoencoder-Based Pipeline For Detecting Anomalous Periodic Variable Stars (Poster)  link » The physical processes of stars are encoded in their periodic pulsations. Millions of variable stars will be observed by the upcoming Vera Rubin Observatory's Legacy Survey of Space and Time. Here, we present a convolutional autoencoder-based pipeline as an automatic approach to search for anomalous periodic variables within The Zwicky Transient Facility Catalog of Periodic Variable Stars (ZTF CPVS). We encode their light curves using a convolutional autoencoder, and we use an isolation forest to sort each periodic variable star by an anomaly score with the latent space. Our overall most anomalous events share some similarities: they are mostly highly variable and irregular evolved stars. An exploration of multiwavelength data suggests that they are most likely Red Giant or Asymptotic Giant Branch stars concentrated in the disk of the Milky Way. Furthermore, we use the learned latent feature for the classification of periodic variables through a hierarchical random forest. This novel semi-supervised approach allows astronomers to identify the most anomalous events within a given physical class, accelerating the potential for scientific discovery. Link » Ho-Sang Chan · Siu Hei Cheung · Victoria Villar · Shirley Ho 🔗 - XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches (Poster)  link » Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations. Link » V Manushree · Sameer Saxena · Parna Chowdhury · Manisimha Varma Manthena · Harsh Rathod · Ankita Ghosh · Sahil Khose 🔗 - Conditional Generation of Periodic Signals with Fourier-Based Decoder (Poster)  link » Periodic signals play an important role in daily lives. Although conventional sequential models have shown remarkable success in various fields, they still come short in modeling periodicity; they either collapse, diverge or ignore details. In this paper, we introduce a novel framework inspired by Fourier series to generate periodic signals. We first decompose the given signals into multiple sines and cosines and then conditionally generate periodic signals with the output components. We have shown our model efficacy on three tasks: reconstruction, imputation and conditional generation. Our model outperforms baselines in all tasks and shows more stable and refined results. Link » Jiyoung Lee · Wonjae Kim · DAEHOON GWAK · Edward Choi 🔗 - Palette: Image-to-Image Diffusion Models (Poster)  link » We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models. Palette models trained on four challenging image-to-image translation tasks (colorization, inpainting, uncropping, and JPEG restoration) outperform strong GAN and regression baselines and bridge the gap with natural images in terms of sample quality scores. This is accomplished without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss, demonstrating a desirable degree of generality and flexibility. We uncover the impact of an $L_2$vs. $L_1$ loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention through empirical architecture studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, with human evaluation and sample quality scores (FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against original images). We expect this standardized evaluation protocol to play a critical role in advancing image-to-image translation research. Finally, we show that a generalist, multi-task Palette model performs as well or better than task-specific specialist counterparts. Check out https://bit.ly/palette-diffusion for more details. Link » Chitwan Saharia · William Chan · Huiwen Chang · Chris Lee · Jonathan Ho · Tim Salimans · David Fleet · Mohammad Norouzi 🔗 - Semi-supervised Multiple Instance Learning using Variational Auto-Encoders (Poster)  link » We consider the multiple-instance learning (MIL) paradigm, which is a special case of supervised learning where training instances are grouped into bags. In MIL, the hidden instance labels do not have to be the same as the label of the comprising bag. On the other hand, the hybrid modelling approach is known to possess advantages basically due to the smooth consolidation of both discriminative and generative components. In this paper, we investigate whether we can get the best of both worlds (MIL and hybrid modelling), especially in a semi-supervised learning (SSL) setting. We first integrate a variational autoencoder (VAE), which is a powerful deep generative model, with an attention-based MIL classifier, then evaluate the performance of the resulting model in SSL. We assess the proposed approach on an established benchmark as well as a real-world medical dataset. Link » Ali Nihat Uzunalioglu · Tameem Adel · Jakub M. Tomczak 🔗 - Variational Autoencoder with Differentiable Physics Engine for Human Gait Analysis and Synthesis (Poster)  link » We address the task of learning generative models of human gait. As gait motion always follows the physical laws, a generative model should also produce outputs that comply with the physical laws, particularly rigid body dynamics with contact and friction. We propose a deep generative model combined with a differentiable physics engine, which outputs physically plausible signals by construction. The proposed model is also equipped with a policy network conditioned on each sample. We show an example of the application of such a model to style transfer of gait. Link » Naoya Takeishi · Alexandros Kalousis 🔗 - A Binded VAE for Inorganic Material Generation (Poster)  link » Designing new industrial materials with desired properties can be very expensive and time consuming. The main difficulty is to generate compounds that correspond to realistic materials. Indeed, description of the compounds as vectors of components' proportions is characterized by a severe sparsity. Furthermore, traditional generative model validation processes as visual verification, FID and Inception scores cannot be used in this context. To tackle these issues, we develop an original Binded-VAE model tailored to generate sharp datasets with high sparsity. We validate the model with novel metrics adapted to the problem of compounds generation. We show on a real issue of rubber compound design that the proposed approach outperforms the standard generative models which opens new perspectives for material design optimization. Link » Fouad OUBARI · Antoine De mathelin · Rodrigue Décatoire · Mathilde MOUGEOT 🔗 - Certifiably Robust Variational Autoencoders (Poster)  link » We introduce an approach for training Variational Autoencoders (VAEs) that are certifiably robust to adversarial attack. Specifically, we first derive actionable bounds on the minimal size of an input perturbation required to change a VAE's reconstruction by more than an allowed amount, with these bounds depending on certain key parameters such as the Lipschitz constants of the encoder and decoder. We then show how these parameters can be controlled, thereby providing a mechanism to ensure \textit{a priori} that a VAE will attain a desired level of robustness. Moreover, we extend this to a complete practical approach for training such VAEs to ensure our criteria are met. Critically, our method allows one to specify a desired level of robustness \emph{upfront} and then train a VAE that is guaranteed to achieve this robustness. We further demonstrate that these \emph{Lipschitz--constrained} VAEs are more robust to attack than standard VAEs in practice. Link » Ben Barrett · Alexander Camuto · Matthew Willetts · Thomas Rainforth 🔗 - Improving Model Compatibility of Generative Adversarial Networks by Boundary Calibration (Poster)  link » Generative Adversarial Networks (GANs) is a powerful family of models that learn an underlying distribution to generate synthetic data. Many existing studies of GANs focus on improving the realness of the generated image data for visual applications, and few of them concern about improving the quality of the generated data for training other classifiers---a task known as the model compatibility problem. As a consequence, existing GANs often prefer generating easier' synthetic data that are far from the boundaries of the classifiers, and refrain from generating near-boundary data, which are known to play an important roles in training the classifiers. To improve GAN in terms of model compatibility, we propose Boundary-Calibration GANs (BCGANs), which leverage the boundary information from a set of pre-trained classifiers using the original data. In particular, we introduce an auxiliary Boundary-Calibration loss (BC-loss) into the generator of GAN to match the statistics between the posterior distributions of original data and generated data with respect to the boundaries of the pre-trained classifiers. The BC-loss is provably unbiased and can be easily coupled with different GAN variants to improve their model compatibility. Experimental results demonstrate that BCGANs not only generate realistic images like original GANs but also achieves superior model compatibility than the original GANs. Link » Si-An Chen · Chun-Liang Li · Hsuan-Tien Lin 🔗 - Instance Semantic Segmentation Benefits from Generative Adversarial Networks (Poster)  link » In design of instance segmentation networks that reconstruct masks, segmentation is often taken as its literal definition -- assigning each pixel a label. This has led to thinking the problem as a template matching one with the goal of minimizing the loss between the reconstructed and the ground truth pixels. Rethinking reconstruction networks as a generator, we define the problem of predicting masks as a GANs game framework: A segmentation network generates the masks, and a discriminator network decides on the quality of the masks. To demonstrate this game, we show effective modifications on the general segmentation framework in Mask R-CNN. We find that playing the game in feature space is more effective than the pixel space leading to stable training between the discriminator and the generator, predicting object coordinates should be replaced by predicting contextual regions for objects, and overall the adversarial loss helps the performance and removes the need for any custom settings per different data domain. We test our framework in various domains and report on cellphone recycling, autonomous driving, large-scale object detection, and medical glands. We observe in general GANs yield masks that account for crispier boundaries, clutter, small objects, and details, being in domain of regular shapes or heterogeneous and coalescing shapes. Our code for reproducing the results is available publicly. Link » Quang Le · KAMAL YOUCEF-TOUMI · Dzmitry Tsetserukou · Ali Jahanian 🔗 - Classifier-Free Diffusion Guidance (Poster)  link » Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. This method combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classifier separate from the diffusion model. We show that guidance can be performed by a pure generative model without such a classifier: we jointly train a conditional and an unconditional diffusion model, and find that it is possible to combine the resulting conditional and unconditional scores to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance. Link » Jonathan Ho · Tim Salimans 🔗 - Accurate Imputation and Efficient Data Acquisitionwith Transformer-based VAEs (Poster)  link » Predicting missing values in tabular data, with uncertainty, is an essential task by itself as well as for downstream tasks such as personalized data acquisition. It is not clear whether state-of-the-art deep generative models for these tasks are well equipped to model the complex relationships that may exist between different features, especially when the subset of observed data are treated as a set. In this work we propose new attention-based models for estimating the joint conditional distribution of randomly missing values in mixed-type tabular data. The models improve on the state-of-the-art Partial Variational Autoencoder (Ma et al. 2019) on a range of imputation and information acquisition tasks. Link » Sarah Lewis · Tatiana Matejovicova · Yingzhen Li · Angus Lamb · Yordan Zaykov · Miltiadis Allamanis · Cheng Zhang 🔗 - Probabilistic Hierarchical Forecasting with Deep Poisson Mixtures (Poster)  link » Hierarchical forecasting problems arise when time series compose a group structure that naturally defines aggregation and disaggregation coherence constraints for the predictions. In this work, we explore a new forecast representation, the Poisson Mixture Mesh (PMM), that can produce probabilistic, coherent predictions; it is compatible with the neural forecasting innovations, and defines simple aggregation and disaggregation rules capable of accommodating hierarchical structures, unknown during its optimization. We perform an empirical evaluation to compare the PMM to other methods on Australian domestic tourism data. Link » Kin Olivares · Oinam Nganba Meetei · Ruijun Ma · Rohan Reddy · Mengfei Cao 🔗 - Deep Generative model with Hierarchical Latent Factors for Timeseries Anomaly Detection (Poster)  link » Multivariate time-series anomaly detection has become an active area of research in recent years, with Deep Learning models outperforming previous approaches on benchmark datasets. Among reconstruction-based models, almost all previous work has focused on Variational Autoencoders and Generative Adversarial Networks. This work presents DGHL, a new family of generative models for time-series anomaly detection, trained by maximizing the observed likelihood directly by posterior sampling and alternating gradient-descent. A top-down Convolution Network maps time-series windows to a novel hierarchical latent space, exploiting temporal dynamics to encode information efficiently. Despite relying on posterior sampling, it is computationally more efficient than current approaches, with up to 10x shorter training times than RNN based models. Our method outperformed other state-of-the-art models on four popular benchmark datasets. Finally, DGHL is robust to variable features between entities and accurate even with large proportions of missing values, settings with increasing relevance with IoT. We demonstrate the superior robustness of DGHL with novel occlusion experiments in this literature. Link » Cristian Challu · Peihong Jiang · Ying Nian Wu · Laurent Callot 🔗 - Entropic Issues in Likelihood-Based OOD Detection (Poster)  link » Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold. Link » Anthony Caterini · Gabriel Loaiza-Ganem 🔗 - Single Image Super-Resolution with Uncertainty Estimation for Lunar Satellite Images (Poster)  link » Recently, there has been a renewed interest in returning to the Moon, with many1planned missions targeting the south pole. This region is of high scientific and commercial interest, mostly due to the presence of water-ice and other volatiles which could enable our sustainable presence on the Moon and beyond. In order to plan safe and effective crewed and robotic missions, access to high-resolution (<0.5 m) surface imagery is critical. However, the overwhelming majority (99.7%) of existing images over the south pole have spatial resolutions >1 m. In order to obtain better images, the only currently available way is to launch a new satellite mission to the Moon with better equipment to gather more precise data. In this work we develop an alternative that can be used directly on previously gathered data and therefore saving a lot of resources. It consist of a single image super-resolution (SR) approach based on generative adversarial networks that is able to super-resolve existing images from 1 m to 0.5 m resolution, unlocking a large catalogue of images (∼50,000) for a more accurate mission planning in the region of interest for the upcoming missions. We show that our enhanced images reveal previously unseen hazards such as small craters and boulders, allowing safer traverse planning. Our approach also includes uncertainty estimation, which allows mission planners to understand the reliability of the super-resolved images. Link » Jose Delgado-Centeno · Paula Harder · Ben Moseley · Valentin Bickel · Siddha Ganju · Miguel Olivares · Alfredo Kalaitzis 🔗 - Self-Supervised Anomaly Detection via Neural Autoregressive Flows with Active Learning (Poster)  link » Many self-supervised methods have been proposed with the target of image anomaly detection. These methods often rely on the paradigm of data augmentation with predefined transformations such as flipping, cropping, and rotations. However, it is not straightforward to apply these techniques for non-image data, such as time series or tabular data, while the performance of the existing deep approaches has been under our expectation on tasks beyond images. In this work, we propose a novel active learning (AL) scheme that relied on neural autoregressive flows (NAF) for self-supervised anomaly detection, specifically on small-scale data. Unlike other generative models such as GANs or VAEs, flow-based models allow to explicitly learn the probability density and thus can assign accurate likelihoods to normal data which makes it usable to detect anomalies. The proposed NAF-AL method is achieved by efficiently generating random samples from latent space and transforming them into feature space along with likelihoods via invertible mapping. The samples with lower likelihoods are selected and further checked by outlier detection using Mahalanobis distance. The augmented samples incorporating with normal samples are used for training a better detector so as to approach decision boundaries. Compared with random transformations, NAF-AL can be interpreted as a likelihood-oriented data augmentation that is more efficient and robust. Extensive experiments show that our approach outperforms existing baselines on multiple time series and tabular datasets, and a real-world application in advanced manufacturing, with significant improvement on anomaly detection accuracy and robustness over the state-of-the-art. Link » Jiaxin Zhang · Kyle Saleeby · Thomas Feldhausen · Sirui Bi · Alex Plotkowski · David Womble 🔗 - Content-Based Image Retrieval from Weakly-Supervised Disentangled Representations (Poster)  link » In content-based image retrieval (CBIR), a database of images is ordered based on the similarity to a query image. Similarity criteria is usually determined with respect to a shared category e.g. whether the database images contain an object of the same type as depicted in the query. Depending on the situation, multiple similarity criteria can be relevant such as the type of object, its color, or the depicted background. Ideally, a dataset labeled with all possible criteria information is available for training a model for computing the similarity. Typically, this is not the case. In this paper, we explore the use of disentangled representations for CBIR with respect to multiple criteria. To alleviate the need for labels, the models used to create the representations are learned via weak supervision by using data organized into groups with shared information. We show that such models can attain better retrieval performances compared to unsupervised baselines. Link » Luis Armando Pérez Rey · Dmitri Jarnikov · Mike Holenderski 🔗 - Deep Variational Semi-Supervised Novelty Detection (Poster)  link » In anomaly detection (AD), one seeks to identify whether a test sample is abnormal, given a data set of normal samples. A recent and promising approach to AD relies on deep generative models, such as variational autoencoders (VAEs), for unsupervised learning of the normal data distribution. In semi-supervised AD (SSAD), the data also includes a small sample of labeled anomalies. In this work, we propose two variational methods for training VAEs for SSAD. The intuitive idea in both methods is to train the encoder to separate' between latent vectors for normal and outlier data. We show that this idea can be derived from principled probabilistic formulations of the problem, and propose simple and effective algorithms. Our methods can be applied to various data types, as we demonstrate on SSAD datasets ranging from natural images to astronomy and medicine, can be combined with any VAE model architecture, and are naturally compatible with ensembling. When comparing to state-of-the-art SSAD methods that are not specific to particular data types, we obtain marked improvement in outlier detection. Link » Tal Daniel · Thanard Kurutach · Aviv Tamar 🔗 - Controllable Network Data Balancing With GANs (Poster)  link » The scarcity of network traffic datasets has become a major impediment to recent traffic analysis research. Data collection is often hampered by privacy concerns, leaving researchers with no choice but to capture limited amounts of highly unbalanced network traffic. Furthermore, traffic classes, particularly network attacks, represent the minority making many techniques such as Deep Learning prone to failure. We address this issue by proposing a Generative Adversarial Network for balancing minority classes and generating highly customizable attack traffic. The framework regulates the generation process with conditional input vectors by creating flows that inherit similar characteristics from the original classes while preserving the flexibility to change their properties. We validate the generated samples with four tests. Our results show that the artificially augmented data is indeed similar to the original set and that the customization mechanism aids in the generation of personalized attack samples while remaining close to the original feature distribution. Link » Fares Meghdouri · Thomas Schmied · Thomas Gaertner · Tanja Zseby 🔗 - A Generalized and Distributable Generative Model for Private Representation Learning (Poster)  link » We study the problem of learning data representations that are private yet informative, i.e., providing information about intended "ally" targets while obfuscating sensitive "adversary" attributes. We propose a novel framework, Exclusion-Inclusion Generative Adversarial Network (EIGAN), that generalizes adversarial private representation learning (PRL) approaches to generate data encodings that account for multiple (possibly overlapping) ally and adversary targets. Preserving privacy is even more difficult when the data is collected across multiple distributed nodes, which for privacy reasons may not wish to share their data even for PRL training. Thus, learning such data representations at each node in a distributed manner (i.e., without transmitting source data) is of particular importance. This motivates us to develop D-EIGAN, the first distributed PRL method, based on fractional parameter sharing that promotes differentially private parameter sharing and also accounts for communication resource limitations. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and consider the impact of dependencies among ally and adversary tasks on the encoder performance. Our experiments on real-world and synthetic datasets demonstrate the advantages of EIGAN encodings in terms of accuracy, robustness, and scalability; in particular, we show that EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement). The experiments further reveal that D-EIGAN's performance is consistent with EIGAN under different node data distributions and is resilient to communication constraints. Link » Sheikh Shams Azam · Taejin Kim · Seyyedali Hosseinalipour · Carlee Joe-Wong · Saurabh Bagchi · Christopher Brinton 🔗 - Score-Based Generative Classifiers (Poster)  link » The tremendous success of generative models in recent years raises the question of whether they can also be used to perform classification. Generative models have been used as adversarially robust classifiers on simple datasets such as MNIST, but this robustness has not been observed on more complex datasets like CIFAR-10. Additionally, on natural image datasets, previous results have suggested a trade-off between the likelihood of the data and classification accuracy. In this work, we investigate score-based generative models as classifiers for natural images. We show that these models not only obtain competitive likelihood values but simultaneously achieve state-of-the-art classification accuracy for generative classifiers on CIFAR-10. Nevertheless, we find that these models are only slightly, if at all, more robust than discriminative baseline models on out-of-distribution tasks based on common image corruptions. Similarly and contrary to prior results, we find that score-based are prone to worst-case distribution shifts in the form of adversarial perturbations. Our work highlights that score-based generative models are closing the gap in classification accuracy compared to standard discriminative models. While they do not yet deliver on the promise of adversarial and out-of-domain robustness, they provide a different approach to classification that warrants further research. Link » Roland S. Zimmermann · Lukas Schott · Yang Song · Benjamin Dunn · David Klindt 🔗 - An Interpretability-augmented Genetic Expert for Deep Molecular Optimization (Poster)  link » The recently proposed genetic expert guided learning (GEGL) framework has demonstrated impressive performances on several de novo molecular design tasks. Despite the displayed state-of-the art results, the proposed system relies on an expert-designed Genetic expert. Although hand-crafted experts allow to navigate the chemical space efficiently, designing such experts requires a significant amount of effort and might contain inherent biases which can potentially slow down convergence or even lead to sub-optimal solutions. In this research, we propose a novel genetic expert named InFrag which is free of design rules and can generate new molecules by combining promising molecular fragments. Fragments are obtained by using an additional graph convolutional neural network which computes attributions for each atom for a given molecule. Molecular substructures which contribute positively to the task score are kept and combined to propose novel molecules. We experimentally demonstrate that, within the GEGL framework, our proposed attribution-based genetic expert is either competitive or outperforms the original expert-designed genetic expert on goal-directed optimization tasks. When limiting the number of optimization rounds to one and three rounds, a performance increase of approximately $43\%$ and $20\%$ respectively is observed compared to the baseline genetic expert. Link » Pierre Wüthrich · Jun Jin Choong · Shinya Yuki 🔗 - Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination (Poster)  link » In this paper, we propose Normality-Calibrated Autoencoder (NCAE), which can boost anomaly detection performance on the contaminated datasets without any prior information or explicit abnormal samples in the training phase. The NCAE adversarially generates high confident normal samples from a latent space having low entropy and leverages them to predict abnormal samples in a training dataset. NCAE is trained to minimise reconstruction errors in uncontaminated samples and maximise reconstruction errors in contaminated samples. The experimental results demonstrate that our method outperforms shallow, hybrid, and deep methods for unsupervised anomaly detection and achieves comparable performance compared with semi-supervised methods using labelled anomaly samples in the training phase. The source code is publicly available on 'https://github.com/andreYoo/NCAE_UAD.git'. Link » Jongmin Yu · Hyeontaek Oh · Minkyung Kim · Junsik Kim 🔗 - Preventing posterior collapse in variational autoencoders for text generation via decoder regularization (Poster)  link » Variational autoencoders trained to minimize the reconstruction error are sensitive to the posterior collapse problem, that is the proposal posterior distribution is always equal to the prior. We propose a novel regularization method based on fraternal dropout to prevent posterior collapse. We evaluate our approach using several metrics and observe improvements in all the tested configurations. Link » Alban Petit · Caio Corro 🔗 - Latent Space Refinement for Deep Generative Models (Poster)  link » Deep generative models are becoming widely used across science and industry for a variety of purposes. A common challenge is achieving a precise implicit or explicit representation of the data probability density. Recent proposals have suggested using classifier weights to refine the learned density of deep generative models. We extend this idea to all types of generative models and show how latent space refinement via iterated generative modeling can circumvent topological obstructions and improve precision. This methodology also applies to cases were the target model is non-differentiable and has many internal latent dimensions which must be marginalized over before refinement. We demonstrate our Latent Space Refinement (LaSeR) protocol on a variety of examples, focusing on the combinations of Normalizing Flows and Generative Adversarial Networks. Link » Ramon Winterhalder · Marco Bellagente · Benjamin Nachman 🔗 - Stochastic Video Prediction with Perceptual Loss (Poster)  link » Predicting future states is a challenging process in the decision-making system because of its inherently uncertain nature. Most works in this literature are based on deep generative networks such as variational autoencoder which uses pixel-wise reconstruction in their loss functions. Predicting the future with pixel-wise reconstruction could fail to capture the full distribution of high-level representations and result in inaccurate and blurred predictions. In this paper, we propose stochastic video generation with perceptual loss (SVG-PL) to improve uncertainty and blurred area in future prediction. The proposed model combines perceptual loss function and pixel-wise loss function for image reconstruction and future state predictions. The model is built on a variational autoencoder to reduce high dimensionality to latent variable to capture both spatial information and temporal dynamics of future prediction. We show that utilization of perceptual loss on video prediction improves reconstruction ability and result in clear predictions. Improvements in video prediction could further help the decision-making process in multiple downstream applications. Link » Donghun Lee · Ingook Jang · Seonghyun Kim · Chanwon Park · JUN HEE PARK 🔗 - Few-Shot Out-of-Domain Transfer of Natural Language Explanations (Poster)  link » Recently, there has been an increasing interest in models that generate natural language explanations (NLEs) for their decisions. However, training a model to explain its decisions in natural language requires the acquisition of task-specific NLEs, which is time- and resource-consuming. A potential solution is the out-of-domain transfer of NLEs, where explainability is transferred from a domain with rich data to a domain with scarce data via few-shot transfer learning. In this work, we introduce and compare four approaches for few-shot transfer learning for NLEs. We transfer explainability from the natural language inference domain, where a large dataset of human-written NLEs already exists, to the domains of hard cases of pronoun resolution, and commonsense validation. Our results demonstrate that few-shot transfer far outperforms both zero-shot transfer and single-task training with few examples. We also investigate the scalability of the few-shot transfer of explanations, both in terms of training data and model size. Link » Yordan Yordanov · Vid Kocijan · Thomas Lukasiewicz · Oana M Camburu 🔗 - Learning Disentangled Representation for Spatiotemporal Graph Generation (Poster)  link » Modeling and understanding spatiotemporal graphs have been a long-standing research topic in network science and typically replies on network processing hypothesized by human knowledge. In this paper, we aim at pushing forward the modeling and understanding of spatiotemporal graphs via new disentangled deep generative models. Specifically, a new Bayesian model is proposed that factorizes spatiotemporal graphs into spatial, temporal, and graph factors as well as the factors that explain the interplay among them. A variational objective function and new mutual information thresholding algorithms driven by information bottleneck theory have been proposed to maximize the disentanglement among the factors with theoretical guarantees. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the superiority of the proposed model over the state-of-the-art by up to 69.2\% for graph generation and 41.5\% for interpretability. Link » Yuanqi Du · Xiaojie Guo · Hengning Cao · Yanfang (Fa Ye · Liang Zhao 🔗 - Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification (Poster)  link » Multi-label classification (MLC) is a prediction task where each sample can have more than one label. We propose a novel contrastive learning boosted multi-label prediction model based on a Gaussian mixture variational autoencoder (C-GMVAE), which learns a multimodal prior space and employs a contrastive loss. Many existing methods introduce extra complex neural modules to capture the label correlations, in addition to the prediction modules. We find that by using contrastive learning in the supervised setting, we can exploit label information effectively, and learn meaningful feature and label embeddings capturing both the label correlations and predictive power, without extra neural modules. Our method also adopts the idea of learning and aligning latent spaces for both features and labels. More specifically, C-GMVAE imposes a Gaussian mixture structure on the latent space, to alleviate posterior collapse and over-regularization issues, in contrast to previous works based on a unimodal prior. C-GMVAE outperforms existing methods on multiple public datasets and can often match other models' full performance with only 50\% of the training data. Furthermore, we show that the learnt embeddings provide insights into the interpretation of label-label interactions. Link » Junwen Bai · Shufeng Kong · Carla Gomes 🔗 - Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation (Poster)  link » The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches.We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases. Link » Tobias Weber · Michael Ingrisch · Bernd Bischl · David Rügamer 🔗 - Finding Maximally Informative Patches in Images (Poster)  link » We consider the problem of distilling an image into an ordered set of maximally informative patches, given prior data from the same domain. We cast this problem as one of maximizing a pointwise mutual information (PMI) objective between a subset of an image's patches and the perceptual content of the entire image. We take an image synthesis-based approach, reasoning that the patches that are most informative would also be most useful for predicting other pixel values. We capture this idea with an image completion CNN trained to model the PMI between an image's perceptual content and any of its subregions. Because our PMI objective is a submodular, monotonic function, we can greedily construct patch sets using the CNN to obtain a provably close approximation to the intractable optimal solution. We evaluate our approach on datasets of faces, common objects, and line drawings. For all datasets, we find that a surprisingly few number of patches are needed to reconstruct most images, demonstrating a particular type of redundancy of information in images, and new potentials in their sparse representations. We also show that these minimal patch sets may be used effectively for downstream tasks such as image classification. Link » Howard Zhong · Guha Balakrishnan · Richard Bowen · Ramin Zabih · Bill Freeman 🔗 - Accurate Imputation and Efficient Data Acquisitionwith Transformer-based VAEs (Oral)  link » Predicting missing values in tabular data, with uncertainty, is an essential task by itself as well as for downstream tasks such as personalized data acquisition. It is not clear whether state-of-the-art deep generative models for these tasks are well equipped to model the complex relationships that may exist between different features, especially when the subset of observed data are treated as a set. In this work we propose new attention-based models for estimating the joint conditional distribution of randomly missing values in mixed-type tabular data. The models improve on the state-of-the-art Partial Variational Autoencoder (Ma et al. 2019) on a range of imputation and information acquisition tasks. Link » Sarah Lewis · Tatiana Matejovicova · Yingzhen Li · Angus Lamb · Yordan Zaykov · Miltiadis Allamanis · Cheng Zhang 🔗 - AGE: Enhancing the Convergence on GANs using Alternating extra-gradient with Gradient Extrapolation (Poster)  link » Generative adversarial networks (GANs) are notably difficult to train since the parameters can get stuck in a local optimum. As a result, methods often suffer not only from degeneration of the convergence speed but also from limitations in the representational power of the trained network. Existing optimization methods to stabilize convergence require multiple gradient computations per iteration. We propose AGE, an alternating extra-gradient method with nonlinear gradient extrapolation, that overcomes these computational inefficiencies and exhibits better convergence properties. It estimates the lookahead step using a nonlinear mixing of past gradient sequences. Empirical results on CIFAR10, CelebA, and several synthetic datasets demonstrate that the introduced approach significantly improves convergence and yields better generative models. Link » Huan He · Shifan Zhao · Yuanzhe Xi · Joyce Ho 🔗 - How to Reward Your Drug Agent? (Oral)  link » Constructing novel molecules from scratch using deep generative models provides useful alternative to traditional virtual screening methods which are limited to the search of the already discovered chemicals. In particular, molecular optimisation combined with sampling guided by reinforcement learning seems like a promising path for discovering novel molecular designs and allows for domain-specific customization of the desired solutions. The choice of a chemically relevant reward function and the exhaustive assessment of its properties remains a challenging task. We introduce the reward function which gives enough flexibility to quantify the biological activity with respect to a selected protein target, drug-likeness, synthesizability and incorporates the custom index of penalised physico-chemical properties. In order to customise the hyper-parameters influencing the RL agent performance, wepropose the methodology which helps to quantify the chemical relevance of the reward function by quantifying the chemical relevance of the samples. We assess the performance of the reward function by docking the molecules with relevant protein targets and quantify the difference with the ground truth samples using Wasserstein distance. Link » Andrea Karlova · Wim Dehaen · Andrei Penciu 🔗 - VAEs meet Diffusion Models: Efficient and High-Fidelity Generation (Poster)  link » Diffusion Probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, Variational Autoencoders (VAEs) have access to a low-dimensional latent space but, despite recent advances, exhibit poor sample quality. We present VAEDM, a novel generative framework for \textit{refining} VAE generated samples using diffusion models while also presenting a novel conditional forward process parameterization for diffusion models. We show that the resulting parameterization can improve upon the unconditional diffusion model in terms of sampling efficiency during inference while also equipping diffusion models with the low-dimensional VAE inferred latent code. Furthermore, we show that the proposed model exhibits out-of-the-box capabilities for downstream tasks like image superresolution and denoising. Link » Kushagra Pandey · Avideep Mukherjee · Piyush Rai · Abhishek Kumar 🔗 - Content-Based Image Retrieval from Weakly-Supervised Disentangled Representations (Oral)  link » In content-based image retrieval (CBIR), a database of images is ordered based on the similarity to a query image. Similarity criteria is usually determined with respect to a shared category e.g. whether the database images contain an object of the same type as depicted in the query. Depending on the situation, multiple similarity criteria can be relevant such as the type of object, its color, or the depicted background. Ideally, a dataset labeled with all possible criteria information is available for training a model for computing the similarity. Typically, this is not the case. In this paper, we explore the use of disentangled representations for CBIR with respect to multiple criteria. To alleviate the need for labels, the models used to create the representations are learned via weak supervision by using data organized into groups with shared information. We show that such models can attain better retrieval performances compared to unsupervised baselines. Link » Luis Armando Pérez Rey · Dmitri Jarnikov · Mike Holenderski 🔗 - Classifier-Free Diffusion Guidance (Oral)  link » Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. This method combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classifier separate from the diffusion model. We show that guidance can be performed by a pure generative model without such a classifier: we jointly train a conditional and an unconditional diffusion model, and find that it is possible to combine the resulting conditional and unconditional scores to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance. Link » Jonathan Ho · Tim Salimans 🔗 - Bayesian Image Reconstruction using Deep Generative Models (Poster)  link » Machine learning models are commonly trained end-to-end and in a supervised setting, using paired (input, output) data. Examples include recent super-resolution methods that train on pairs of (low-resolution, high-resolution) images. However, these end-to-end approaches require re-training every time there is a distribution shift in the inputs (e.g., night images vs daylight) or relevant latent variables (e.g., camera blur or hand motion). In this work, we leverage state-of-the-art (SOTA) generative models (here StyleGAN2) for building powerful image priors, which enable application of Bayes' theorem for many downstream reconstruction tasks. Our method, "Bayesian Reconstruction through Generative Models" (BRGM), uses a single pre-trained generator model to solve different image restoration tasks, i.e., super-resolution and in-painting, by combining it with different forward corruption models. We keep the weights of the generator model fixed, and reconstruct the image by estimating the Bayesian maximum a-posteriori (MAP) estimate over the input latent vector that generated the reconstructed image. We further use Variational Inference to approximate the posterior distribution over the latent vectors, from which we sample multiple solutions. We demonstrate BRGM on three large and diverse datasets: (i) 60,000 images from the Flick Faces High Quality dataset (ii) 240,000 chest X-rays from MIMIC III and (iii) a combined collection of 5 brain MRI datasets with 7,329 scans. Across all three datasets and without any dataset-specific hyperparameter tuning, our simple approach yields performance competitive with current task-specific state-of-the-art methods on super-resolution and in-painting, while being more generalisable and without requiring any training. Our source code and pre-trained models are available online: https://razvanmarinescu.github.io/brgm/ Link » Razvan Marinescu · Daniel Moyer · Polina Golland 🔗 - Searching for the Weirdest Stars: A Convolutional Autoencoder-Based Pipeline For Detecting Anomalous Periodic Variable Stars (Oral)  link » The physical processes of stars are encoded in their periodic pulsations. Millions of variable stars will be observed by the upcoming Vera Rubin Observatory's Legacy Survey of Space and Time. Here, we present a convolutional autoencoder-based pipeline as an automatic approach to search for anomalous periodic variables within The Zwicky Transient Facility Catalog of Periodic Variable Stars (ZTF CPVS). We encode their light curves using a convolutional autoencoder, and we use an isolation forest to sort each periodic variable star by an anomaly score with the latent space. Our overall most anomalous events share some similarities: they are mostly highly variable and irregular evolved stars. An exploration of multiwavelength data suggests that they are most likely Red Giant or Asymptotic Giant Branch stars concentrated in the disk of the Milky Way. Furthermore, we use the learned latent feature for the classification of periodic variables through a hierarchical random forest. This novel semi-supervised approach allows astronomers to identify the most anomalous events within a given physical class, accelerating the potential for scientific discovery. Link » Ho-Sang Chan · Siu Hei Cheung · Victoria Villar · Shirley Ho 🔗 - Your Dataset is a Multiset and You Should Compress it Like One (Poster)  link » Neural Compressors (NCs) are codecs that leverage neural networks and entropy coding to achieve competitive compression performance for images, audio, and other data types. These compressors exploit parallel hardware, and are particularly well suited to compressing i.i.d. batches of data. The average number of bits needed to represent each example is at least the well-known cross-entropy. However, the cross-entropy bound assumes the order of the compressed examples in a batch is preserved, which in many applications is not necessary. The number of bits used to implicitly store the order information is the logarithm of the number of unique permutations of the dataset. In this work, we present a method that reduces the bitrate of any codec by exactly the number of bits needed to store the order, at the expense of shuffling the dataset in the process. Conceptually, our method applies bits-back coding to a latent variable model with observed symbol counts (i.e. multiset) and a latent permutation defining the ordering, and does not require retraining any models. We present experiments with both lossy off-the-shelf codecs (WebP) as well as lossless NCs. On Binarized MNIST, lossless NCs achieved savings of up to $7.6\%$, while adding only $10\%$ extra compute time. Link » Daniel Severo · James Townsend · Ashish Khisti · Alireza Makhzani · Karen Ullrich 🔗 - Probabilistic Hierarchical Forecasting with Deep Poisson Mixtures (Oral)  link » Hierarchical forecasting problems arise when time series compose a group structure that naturally defines aggregation and disaggregation coherence constraints for the predictions. In this work, we explore a new forecast representation, the Poisson Mixture Mesh (PMM), that can produce probabilistic, coherent predictions; it is compatible with the neural forecasting innovations, and defines simple aggregation and disaggregation rules capable of accommodating hierarchical structures, unknown during its optimization. We perform an empirical evaluation to compare the PMM to other methods on Australian domestic tourism data. Link » Kin Olivares · Oinam Nganba Meetei · Ruijun Ma · Rohan Reddy · Mengfei Cao 🔗 - Uncertainty-aware Labelled Augmentations for High Dimensional Latent Space Bayesian Optimization (Oral)  link » Black-box optimization problems are ubiquitous and of importance in many critical areas of science and engineering. Bayesian optimisation (BO) over the past years has emerged as one of the most successful techniques for optimising expensive black-box objectives. However, efficient scaling of BO to high-dimensional settings has proven to be extremely challenging. Traditional strategies based on projecting high-dimensional input data to a lower-dimensional manifold, such as Variational autoencoders (VAE) and Generative adversarial networks (GAN) have improved BO performance in high-dimensional regimes, but their dependence on excessive labeled input data has been widely reported. In this work, we target the data-greedy nature of deep generative models by constructing uncertainty-aware task-specific labeled data augmentations using Gaussian processes (GPs). Our approach outperforms existing state-of-the-art methods on machine learning tasks and demonstrates more informative data representation with limited supervision. Link » Ekansh Verma · Souradip Chakraborty 🔗 - A Binded VAE for Inorganic Material Generation (Oral)  link » Designing new industrial materials with desired properties can be very expensive and time consuming. The main difficulty is to generate compounds that correspond to realistic materials. Indeed, description of the compounds as vectors of components' proportions is characterized by a severe sparsity. Furthermore, traditional generative model validation processes as visual verification, FID and Inception scores cannot be used in this context. To tackle these issues, we develop an original Binded-VAE model tailored to generate sharp datasets with high sparsity. We validate the model with novel metrics adapted to the problem of compounds generation. We show on a real issue of rubber compound design that the proposed approach outperforms the standard generative models which opens new perspectives for material design optimization. Link » Fouad OUBARI · Antoine De mathelin · Rodrigue Décatoire · Mathilde MOUGEOT 🔗 - Controllable Network Data Balancing With GANs (Oral)  link » The scarcity of network traffic datasets has become a major impediment to recent traffic analysis research. Data collection is often hampered by privacy concerns, leaving researchers with no choice but to capture limited amounts of highly unbalanced network traffic. Furthermore, traffic classes, particularly network attacks, represent the minority making many techniques such as Deep Learning prone to failure. We address this issue by proposing a Generative Adversarial Network for balancing minority classes and generating highly customizable attack traffic. The framework regulates the generation process with conditional input vectors by creating flows that inherit similar characteristics from the original classes while preserving the flexibility to change their properties. We validate the generated samples with four tests. Our results show that the artificially augmented data is indeed similar to the original set and that the customization mechanism aids in the generation of personalized attack samples while remaining close to the original feature distribution. Link » Fares Meghdouri · Thomas Schmied · Thomas Gaertner · Tanja Zseby 🔗 - Palette: Image-to-Image Diffusion Models (Oral)  link » We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models. Palette models trained on four challenging image-to-image translation tasks (colorization, inpainting, uncropping, and JPEG restoration) outperform strong GAN and regression baselines and bridge the gap with natural images in terms of sample quality scores. This is accomplished without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss, demonstrating a desirable degree of generality and flexibility. We uncover the impact of an $L_2$vs. $L_1$ loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention through empirical architecture studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, with human evaluation and sample quality scores (FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against original images). We expect this standardized evaluation protocol to play a critical role in advancing image-to-image translation research. Finally, we show that a generalist, multi-task Palette model performs as well or better than task-specific specialist counterparts. Check out https://bit.ly/palette-diffusion for more details. Link » Chitwan Saharia · William Chan · Huiwen Chang · Chris Lee · Jonathan Ho · Tim Salimans · David Fleet · Mohammad Norouzi 🔗 - Stochastic Video Prediction with Perceptual Loss (Oral)  link » Predicting future states is a challenging process in the decision-making system because of its inherently uncertain nature. Most works in this literature are based on deep generative networks such as variational autoencoder which uses pixel-wise reconstruction in their loss functions. Predicting the future with pixel-wise reconstruction could fail to capture the full distribution of high-level representations and result in inaccurate and blurred predictions. In this paper, we propose stochastic video generation with perceptual loss (SVG-PL) to improve uncertainty and blurred area in future prediction. The proposed model combines perceptual loss function and pixel-wise loss function for image reconstruction and future state predictions. The model is built on a variational autoencoder to reduce high dimensionality to latent variable to capture both spatial information and temporal dynamics of future prediction. We show that utilization of perceptual loss on video prediction improves reconstruction ability and result in clear predictions. Improvements in video prediction could further help the decision-making process in multiple downstream applications. Link » Donghun Lee · Ingook Jang · Seonghyun Kim · Chanwon Park · JUN HEE PARK 🔗 - Particle Dynamics for Learning EBMs (Poster)  link » Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such sampling paradigms run MCMC targeting the current model, which requires infinitely long chains to generate samples from the true energy distribution and is problematic in practice. This paper proposes an alternative approach to getting these samples and avoiding crude MCMC sampling from the current model. We accomplish this by viewing the evolution of the modeling distribution as (i) the evolution of the energy function, and (ii) the evolution of the samples from this distribution along some vector field. We subsequently derive this time-dependent vector field such that the particles following this field are approximately distributed as the current density model. Thereby we match the evolution of the particles with the evolution of the energy function prescribed by the learning procedure. Importantly, unlike Monte Carlo sampling, our method targets to match the current distribution in a finite time. Finally, we demonstrate its effectiveness empirically comparing to MCMC-based learning methods. Link » Kirill Neklyudov · Priyank Jaini · Max Welling 🔗 - Instance Semantic Segmentation Benefits from Generative Adversarial Networks (Oral)  link » In design of instance segmentation networks that reconstruct masks, segmentation is often taken as its literal definition -- assigning each pixel a label. This has led to thinking the problem as a template matching one with the goal of minimizing the loss between the reconstructed and the ground truth pixels. Rethinking reconstruction networks as a generator, we define the problem of predicting masks as a GANs game framework: A segmentation network generates the masks, and a discriminator network decides on the quality of the masks. To demonstrate this game, we show effective modifications on the general segmentation framework in Mask R-CNN. We find that playing the game in feature space is more effective than the pixel space leading to stable training between the discriminator and the generator, predicting object coordinates should be replaced by predicting contextual regions for objects, and overall the adversarial loss helps the performance and removes the need for any custom settings per different data domain. We test our framework in various domains and report on cellphone recycling, autonomous driving, large-scale object detection, and medical glands. We observe in general GANs yield masks that account for crispier boundaries, clutter, small objects, and details, being in domain of regular shapes or heterogeneous and coalescing shapes. Our code for reproducing the results is available publicly. Link » Quang Le · KAMAL YOUCEF-TOUMI · Dzmitry Tsetserukou · Ali Jahanian 🔗 - Sample-Efficient Generation of Novel Photo-acid Generator Molecules using a Deep Generative Model (Poster)  link » Photo-acid generators (PAGs) are compounds that release acids ($H^+$ ions) when exposed to light. These compounds are critical components of the photolithography processes that are used in the manufacture of semiconductor logic and memory chips. The exponential increase in the demand for semiconductors has highlighted the need for discovering novel photo-acid generators. While de novo molecule design using deep generative models has been widely employed for drug discovery and material design, its application to the creation of novel photo-acid generators poses several unique challenges, such as lack of property labels. In this paper, we highlight these challenges and propose a generative modeling approach that utilizes conditional generation from a pre-trained deep autoencoder and expert-in-the-loop techniques. The validity of the proposed approach was evaluated with the help of subject matter experts, indicating the promise of such an approach for applications beyond the creation of novel photo-acid generators. Link » Samuel Hoffman · Vijil Chenthamarakshan · Dmitry Zubarev · Daniel Sanders · Payel Das 🔗 - Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations (Oral)  link » Controllable audio synthesis is a core element of creative sound design. Recent advancements in AI have made high-fidelity neural audio synthesis achievable. However, the high temporal resolution of audio and our perceptual sensitivity to small irregularities in waveforms make synthesizing at high sampling rates a complex and computationally intensive task, prohibiting real-time, controllable synthesis within many approaches. In this work we aim to shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in generative frameworks for audio synthesis.Implicit neural representations (INRs) are neural networks used to approximate low-dimensional functions, trained to represent a single geometric object by mapping input coordinates to structural information at input locations. In contrast with other neural methods for representing geometric objects, the memory required to parameterize the object is independent of resolution, and only scales with its complexity. A corollary of this is that INRs have infinite resolution, as they can be sampled at arbitrary resolutions. To apply the concept of INRs in the generative domain we frame generative modelling as learning a distribution of continuous functions. This can be achieved by introducing conditioning methods to INRs.Our experiments show that small Periodic Conditional INRs (PCINRs) learn faster and generally produce quantitatively better audio reconstructions than Transposed Convolutional Neural Networks with equal parameter counts. However, their performance is very sensitive to activation scaling hyperparameters. When learning to represent more uniform sets, PCINRs tend to introduce artificial high-frequency components in reconstructions. We validate this noise can be minimized by applying standard weight regularization during training or decreasing the compositional depth of PCINRs, and suggest directions for future research. Link » Jan Zuiderveld · Marco Federici · Erik Bekkers 🔗 - Conditional Generation of Periodic Signals with Fourier-Based Decoder (Oral)  link » Periodic signals play an important role in daily lives. Although conventional sequential models have shown remarkable success in various fields, they still come short in modeling periodicity; they either collapse, diverge or ignore details. In this paper, we introduce a novel framework inspired by Fourier series to generate periodic signals. We first decompose the given signals into multiple sines and cosines and then conditionally generate periodic signals with the output components. We have shown our model efficacy on three tasks: reconstruction, imputation and conditional generation. Our model outperforms baselines in all tasks and shows more stable and refined results. Link » Jiyoung Lee · Wonjae Kim · DAEHOON GWAK · Edward Choi 🔗 - Finding Maximally Informative Patches in Images (Oral)  link » We consider the problem of distilling an image into an ordered set of maximally informative patches, given prior data from the same domain. We cast this problem as one of maximizing a pointwise mutual information (PMI) objective between a subset of an image's patches and the perceptual content of the entire image. We take an image synthesis-based approach, reasoning that the patches that are most informative would also be most useful for predicting other pixel values. We capture this idea with an image completion CNN trained to model the PMI between an image's perceptual content and any of its subregions. Because our PMI objective is a submodular, monotonic function, we can greedily construct patch sets using the CNN to obtain a provably close approximation to the intractable optimal solution. We evaluate our approach on datasets of faces, common objects, and line drawings. For all datasets, we find that a surprisingly few number of patches are needed to reconstruct most images, demonstrating a particular type of redundancy of information in images, and new potentials in their sparse representations. We also show that these minimal patch sets may be used effectively for downstream tasks such as image classification. Link » Howard Zhong · Guha Balakrishnan · Richard Bowen · Ramin Zabih · Bill Freeman 🔗 - Preventing posterior collapse in variational autoencoders for text generation via decoder regularization (Oral)  link » Variational autoencoders trained to minimize the reconstruction error are sensitive to the posterior collapse problem, that is the proposal posterior distribution is always equal to the prior. We propose a novel regularization method based on fraternal dropout to prevent posterior collapse. We evaluate our approach using several metrics and observe improvements in all the tested configurations. Link » Alban Petit · Caio Corro 🔗 - An Interpretability-augmented Genetic Expert for Deep Molecular Optimization (Oral)  link » The recently proposed genetic expert guided learning (GEGL) framework has demonstrated impressive performances on several de novo molecular design tasks. Despite the displayed state-of-the art results, the proposed system relies on an expert-designed Genetic expert. Although hand-crafted experts allow to navigate the chemical space efficiently, designing such experts requires a significant amount of effort and might contain inherent biases which can potentially slow down convergence or even lead to sub-optimal solutions. In this research, we propose a novel genetic expert named InFrag which is free of design rules and can generate new molecules by combining promising molecular fragments. Fragments are obtained by using an additional graph convolutional neural network which computes attributions for each atom for a given molecule. Molecular substructures which contribute positively to the task score are kept and combined to propose novel molecules. We experimentally demonstrate that, within the GEGL framework, our proposed attribution-based genetic expert is either competitive or outperforms the original expert-designed genetic expert on goal-directed optimization tasks. When limiting the number of optimization rounds to one and three rounds, a performance increase of approximately $43\%$ and $20\%$ respectively is observed compared to the baseline genetic expert. Link » Pierre Wüthrich · Jun Jin Choong · Shinya Yuki 🔗 - Deep Generative model with Hierarchical Latent Factors for Timeseries Anomaly Detection (Oral)  link » Multivariate time-series anomaly detection has become an active area of research in recent years, with Deep Learning models outperforming previous approaches on benchmark datasets. Among reconstruction-based models, almost all previous work has focused on Variational Autoencoders and Generative Adversarial Networks. This work presents DGHL, a new family of generative models for time-series anomaly detection, trained by maximizing the observed likelihood directly by posterior sampling and alternating gradient-descent. A top-down Convolution Network maps time-series windows to a novel hierarchical latent space, exploiting temporal dynamics to encode information efficiently. Despite relying on posterior sampling, it is computationally more efficient than current approaches, with up to 10x shorter training times than RNN based models. Our method outperformed other state-of-the-art models on four popular benchmark datasets. Finally, DGHL is robust to variable features between entities and accurate even with large proportions of missing values, settings with increasing relevance with IoT. We demonstrate the superior robustness of DGHL with novel occlusion experiments in this literature. Link » Cristian Challu · Peihong Jiang · Ying Nian Wu · Laurent Callot 🔗 - XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches (Oral)  link » Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations. Link » V Manushree · Sameer Saxena · Parna Chowdhury · Manisimha Varma Manthena · Harsh Rathod · Ankita Ghosh · Sahil Khose 🔗 - Variational Autoencoder with Differentiable Physics Engine for Human Gait Analysis and Synthesis (Oral)  link » We address the task of learning generative models of human gait. As gait motion always follows the physical laws, a generative model should also produce outputs that comply with the physical laws, particularly rigid body dynamics with contact and friction. We propose a deep generative model combined with a differentiable physics engine, which outputs physically plausible signals by construction. The proposed model is also equipped with a policy network conditioned on each sample. We show an example of the application of such a model to style transfer of gait. Link » Naoya Takeishi · Alexandros Kalousis 🔗 - Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation (Oral)  link » The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches.We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases. Link » Tobias Weber · Michael Ingrisch · Bernd Bischl · David Rügamer 🔗 - A Generalized and Distributable Generative Model for Private Representation Learning (Oral)  link » We study the problem of learning data representations that are private yet informative, i.e., providing information about intended "ally" targets while obfuscating sensitive "adversary" attributes. We propose a novel framework, Exclusion-Inclusion Generative Adversarial Network (EIGAN), that generalizes adversarial private representation learning (PRL) approaches to generate data encodings that account for multiple (possibly overlapping) ally and adversary targets. Preserving privacy is even more difficult when the data is collected across multiple distributed nodes, which for privacy reasons may not wish to share their data even for PRL training. Thus, learning such data representations at each node in a distributed manner (i.e., without transmitting source data) is of particular importance. This motivates us to develop D-EIGAN, the first distributed PRL method, based on fractional parameter sharing that promotes differentially private parameter sharing and also accounts for communication resource limitations. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and consider the impact of dependencies among ally and adversary tasks on the encoder performance. Our experiments on real-world and synthetic datasets demonstrate the advantages of EIGAN encodings in terms of accuracy, robustness, and scalability; in particular, we show that EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement). The experiments further reveal that D-EIGAN's performance is consistent with EIGAN under different node data distributions and is resilient to communication constraints. Link » Sheikh Shams Azam · Taejin Kim · Seyyedali Hosseinalipour · Carlee Joe-Wong · Saurabh Bagchi · Christopher Brinton 🔗 - Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification (Oral)  link » Multi-label classification (MLC) is a prediction task where each sample can have more than one label. We propose a novel contrastive learning boosted multi-label prediction model based on a Gaussian mixture variational autoencoder (C-GMVAE), which learns a multimodal prior space and employs a contrastive loss. Many existing methods introduce extra complex neural modules to capture the label correlations, in addition to the prediction modules. We find that by using contrastive learning in the supervised setting, we can exploit label information effectively, and learn meaningful feature and label embeddings capturing both the label correlations and predictive power, without extra neural modules. Our method also adopts the idea of learning and aligning latent spaces for both features and labels. More specifically, C-GMVAE imposes a Gaussian mixture structure on the latent space, to alleviate posterior collapse and over-regularization issues, in contrast to previous works based on a unimodal prior. C-GMVAE outperforms existing methods on multiple public datasets and can often match other models' full performance with only 50\% of the training data. Furthermore, we show that the learnt embeddings provide insights into the interpretation of label-label interactions. Link » Junwen Bai · Shufeng Kong · Carla Gomes 🔗 - Deep Variational Semi-Supervised Novelty Detection (Oral)  link » In anomaly detection (AD), one seeks to identify whether a test sample is abnormal, given a data set of normal samples. A recent and promising approach to AD relies on deep generative models, such as variational autoencoders (VAEs), for unsupervised learning of the normal data distribution. In semi-supervised AD (SSAD), the data also includes a small sample of labeled anomalies. In this work, we propose two variational methods for training VAEs for SSAD. The intuitive idea in both methods is to train the encoder to separate' between latent vectors for normal and outlier data. We show that this idea can be derived from principled probabilistic formulations of the problem, and propose simple and effective algorithms. Our methods can be applied to various data types, as we demonstrate on SSAD datasets ranging from natural images to astronomy and medicine, can be combined with any VAE model architecture, and are naturally compatible with ensembling. When comparing to state-of-the-art SSAD methods that are not specific to particular data types, we obtain marked improvement in outlier detection. Link » Tal Daniel · Thanard Kurutach · Aviv Tamar 🔗 - Latent Space Refinement for Deep Generative Models (Oral)  link » Deep generative models are becoming widely used across science and industry for a variety of purposes. A common challenge is achieving a precise implicit or explicit representation of the data probability density. Recent proposals have suggested using classifier weights to refine the learned density of deep generative models. We extend this idea to all types of generative models and show how latent space refinement via iterated generative modeling can circumvent topological obstructions and improve precision. This methodology also applies to cases were the target model is non-differentiable and has many internal latent dimensions which must be marginalized over before refinement. We demonstrate our Latent Space Refinement (LaSeR) protocol on a variety of examples, focusing on the combinations of Normalizing Flows and Generative Adversarial Networks. Link » Ramon Winterhalder · Marco Bellagente · Benjamin Nachman 🔗 - Improving Model Compatibility of Generative Adversarial Networks by Boundary Calibration (Oral)  link » Generative Adversarial Networks (GANs) is a powerful family of models that learn an underlying distribution to generate synthetic data. Many existing studies of GANs focus on improving the realness of the generated image data for visual applications, and few of them concern about improving the quality of the generated data for training other classifiers---a task known as the model compatibility problem. As a consequence, existing GANs often prefer generating easier' synthetic data that are far from the boundaries of the classifiers, and refrain from generating near-boundary data, which are known to play an important roles in training the classifiers. To improve GAN in terms of model compatibility, we propose Boundary-Calibration GANs (BCGANs), which leverage the boundary information from a set of pre-trained classifiers using the original data. In particular, we introduce an auxiliary Boundary-Calibration loss (BC-loss) into the generator of GAN to match the statistics between the posterior distributions of original data and generated data with respect to the boundaries of the pre-trained classifiers. The BC-loss is provably unbiased and can be easily coupled with different GAN variants to improve their model compatibility. Experimental results demonstrate that BCGANs not only generate realistic images like original GANs but also achieves superior model compatibility than the original GANs. Link » Si-An Chen · Chun-Liang Li · Hsuan-Tien Lin 🔗 - Learning Disentangled Representation for Spatiotemporal Graph Generation (Oral)  link » Modeling and understanding spatiotemporal graphs have been a long-standing research topic in network science and typically replies on network processing hypothesized by human knowledge. In this paper, we aim at pushing forward the modeling and understanding of spatiotemporal graphs via new disentangled deep generative models. Specifically, a new Bayesian model is proposed that factorizes spatiotemporal graphs into spatial, temporal, and graph factors as well as the factors that explain the interplay among them. A variational objective function and new mutual information thresholding algorithms driven by information bottleneck theory have been proposed to maximize the disentanglement among the factors with theoretical guarantees. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the superiority of the proposed model over the state-of-the-art by up to 69.2\% for graph generation and 41.5\% for interpretability. Link » Yuanqi Du · Xiaojie Guo · Hengning Cao · Yanfang (Fa Ye · Liang Zhao 🔗 - Score-Based Generative Classifiers (Oral)  link » The tremendous success of generative models in recent years raises the question of whether they can also be used to perform classification. Generative models have been used as adversarially robust classifiers on simple datasets such as MNIST, but this robustness has not been observed on more complex datasets like CIFAR-10. Additionally, on natural image datasets, previous results have suggested a trade-off between the likelihood of the data and classification accuracy. In this work, we investigate score-based generative models as classifiers for natural images. We show that these models not only obtain competitive likelihood values but simultaneously achieve state-of-the-art classification accuracy for generative classifiers on CIFAR-10. Nevertheless, we find that these models are only slightly, if at all, more robust than discriminative baseline models on out-of-distribution tasks based on common image corruptions. Similarly and contrary to prior results, we find that score-based are prone to worst-case distribution shifts in the form of adversarial perturbations. Our work highlights that score-based generative models are closing the gap in classification accuracy compared to standard discriminative models. While they do not yet deliver on the promise of adversarial and out-of-domain robustness, they provide a different approach to classification that warrants further research. Link » Roland S. Zimmermann · Lukas Schott · Yang Song · Benjamin Dunn · David Klindt 🔗 - Transparent Liquid Segmentation for Robotic Pouring (Oral)  link » Liquid state estimation is important for robotics tasks such as pouring; however, estimating the state of transparent liquids is a challenging problem. We propose a novel segmentation pipeline that can segment transparent liquids such as water from a static, RGB image without requiring any manual annotations or heating of the liquid for training. Instead, we use a generative model that is capable of translating unpaired images of colored liquids into synthetically generated transparent liquid images. Segmentation labels of colored liquids are obtained automatically using background subtraction. We use paired samples of synthetically generated transparent liquid images and background subtraction for our segmentation pipeline. Our experiments show that we are able to accurately predict a segmentation mask for transparent liquids without requiring any manual annotations. We demonstrate the utility of transparent liquid segmentation in a robotic pouring task that controls pouring by perceiving liquid height in a transparent cup. Accompanying video and supplementary information can be found at https://sites.google.com/view/roboticliquidpouring Link » Gautham Narayan Narasimhan · Kai Zhang · Benjamin Eisner · Xingyu Lin · David Held 🔗 - Grapher: Multi-Stage Knowledge Graph Construction using Pretrained Language Models (Poster)  link » In this work we address the problem of Knowledge Graph (KG) construction from text, proposing a novel end-to-end multi-stage Grapher system, that separates the overall generation process into two stages. The graph nodes are generated first using pretrained language model, followed by a simple edge construction head, enabling efficient KG extraction from the textual descriptions. For each stage we proposed several architectural choices that can be used depending on the available training resources. We evaluated the Grapher on a recent WebNLG 2020 Challenge dataset, achieving competitive results on text-to-RDF generation task, as well as on a recent large-scale TekGen dataset, showing strong overall performance. We believe that the proposed Grapher system can serve as a viable KG construction alternative to the existing linearization or sampling-based graph generation approaches. Link » Igor Melnyk · Pierre Dognin · Payel Das 🔗 - Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination (Oral)  link » In this paper, we propose Normality-Calibrated Autoencoder (NCAE), which can boost anomaly detection performance on the contaminated datasets without any prior information or explicit abnormal samples in the training phase. The NCAE adversarially generates high confident normal samples from a latent space having low entropy and leverages them to predict abnormal samples in a training dataset. NCAE is trained to minimise reconstruction errors in uncontaminated samples and maximise reconstruction errors in contaminated samples. The experimental results demonstrate that our method outperforms shallow, hybrid, and deep methods for unsupervised anomaly detection and achieves comparable performance compared with semi-supervised methods using labelled anomaly samples in the training phase. The source code is publicly available on 'https://github.com/andreYoo/NCAE_UAD.git'. Link » Jongmin Yu · Hyeontaek Oh · Minkyung Kim · Junsik Kim 🔗 - Self-Supervised Anomaly Detection via Neural Autoregressive Flows with Active Learning (Oral)  link » Many self-supervised methods have been proposed with the target of image anomaly detection. These methods often rely on the paradigm of data augmentation with predefined transformations such as flipping, cropping, and rotations. However, it is not straightforward to apply these techniques for non-image data, such as time series or tabular data, while the performance of the existing deep approaches has been under our expectation on tasks beyond images. In this work, we propose a novel active learning (AL) scheme that relied on neural autoregressive flows (NAF) for self-supervised anomaly detection, specifically on small-scale data. Unlike other generative models such as GANs or VAEs, flow-based models allow to explicitly learn the probability density and thus can assign accurate likelihoods to normal data which makes it usable to detect anomalies. The proposed NAF-AL method is achieved by efficiently generating random samples from latent space and transforming them into feature space along with likelihoods via invertible mapping. The samples with lower likelihoods are selected and further checked by outlier detection using Mahalanobis distance. The augmented samples incorporating with normal samples are used for training a better detector so as to approach decision boundaries. Compared with random transformations, NAF-AL can be interpreted as a likelihood-oriented data augmentation that is more efficient and robust. Extensive experiments show that our approach outperforms existing baselines on multiple time series and tabular datasets, and a real-world application in advanced manufacturing, with significant improvement on anomaly detection accuracy and robustness over the state-of-the-art. Link » Jiaxin Zhang · Kyle Saleeby · Thomas Feldhausen · Sirui Bi · Alex Plotkowski · David Womble 🔗 - Semi-supervised Multiple Instance Learning using Variational Auto-Encoders (Oral)  link » We consider the multiple-instance learning (MIL) paradigm, which is a special case of supervised learning where training instances are grouped into bags. In MIL, the hidden instance labels do not have to be the same as the label of the comprising bag. On the other hand, the hybrid modelling approach is known to possess advantages basically due to the smooth consolidation of both discriminative and generative components. In this paper, we investigate whether we can get the best of both worlds (MIL and hybrid modelling), especially in a semi-supervised learning (SSL) setting. We first integrate a variational autoencoder (VAE), which is a powerful deep generative model, with an attention-based MIL classifier, then evaluate the performance of the resulting model in SSL. We assess the proposed approach on an established benchmark as well as a real-world medical dataset. Link » Ali Nihat Uzunalioglu · Tameem Adel · Jakub M. Tomczak 🔗 - Certifiably Robust Variational Autoencoders (Oral)  link » We introduce an approach for training Variational Autoencoders (VAEs) that are certifiably robust to adversarial attack. Specifically, we first derive actionable bounds on the minimal size of an input perturbation required to change a VAE's reconstruction by more than an allowed amount, with these bounds depending on certain key parameters such as the Lipschitz constants of the encoder and decoder. We then show how these parameters can be controlled, thereby providing a mechanism to ensure \textit{a priori} that a VAE will attain a desired level of robustness. Moreover, we extend this to a complete practical approach for training such VAEs to ensure our criteria are met. Critically, our method allows one to specify a desired level of robustness \emph{upfront} and then train a VAE that is guaranteed to achieve this robustness. We further demonstrate that these \emph{Lipschitz--constrained} VAEs are more robust to attack than standard VAEs in practice. Link » Ben Barrett · Alexander Camuto · Matthew Willetts · Thomas Rainforth 🔗 - Entropic Issues in Likelihood-Based OOD Detection (Oral)  link » Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold. Link » Anthony Caterini · Gabriel Loaiza-Ganem 🔗 - Single Image Super-Resolution with Uncertainty Estimation for Lunar Satellite Images (Oral)  link » Recently, there has been a renewed interest in returning to the Moon, with many1planned missions targeting the south pole. This region is of high scientific and commercial interest, mostly due to the presence of water-ice and other volatiles which could enable our sustainable presence on the Moon and beyond. In order to plan safe and effective crewed and robotic missions, access to high-resolution (<0.5 m) surface imagery is critical. However, the overwhelming majority (99.7%) of existing images over the south pole have spatial resolutions >1 m. In order to obtain better images, the only currently available way is to launch a new satellite mission to the Moon with better equipment to gather more precise data. In this work we develop an alternative that can be used directly on previously gathered data and therefore saving a lot of resources. It consist of a single image super-resolution (SR) approach based on generative adversarial networks that is able to super-resolve existing images from 1 m to 0.5 m resolution, unlocking a large catalogue of images (∼50,000) for a more accurate mission planning in the region of interest for the upcoming missions. We show that our enhanced images reveal previously unseen hazards such as small craters and boulders, allowing safer traverse planning. Our approach also includes uncertainty estimation, which allows mission planners to understand the reliability of the super-resolved images. Link » Jose Delgado-Centeno · Paula Harder · Ben Moseley · Valentin Bickel · Siddha Ganju · Miguel Olivares · Alfredo Kalaitzis 🔗 - Few-Shot Out-of-Domain Transfer of Natural Language Explanations (Oral)  link » Recently, there has been an increasing interest in models that generate natural language explanations (NLEs) for their decisions. However, training a model to explain its decisions in natural language requires the acquisition of task-specific NLEs, which is time- and resource-consuming. A potential solution is the out-of-domain transfer of NLEs, where explainability is transferred from a domain with rich data to a domain with scarce data via few-shot transfer learning. In this work, we introduce and compare four approaches for few-shot transfer learning for NLEs. We transfer explainability from the natural language inference domain, where a large dataset of human-written NLEs already exists, to the domains of hard cases of pronoun resolution, and commonsense validation. Our results demonstrate that few-shot transfer far outperforms both zero-shot transfer and single-task training with few examples. We also investigate the scalability of the few-shot transfer of explanations, both in terms of training data and model size. Link » Yordan Yordanov · Vid Kocijan · Thomas Lukasiewicz · Oana M Camburu 🔗 - Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations (Poster) 🔗

Author Information

Cheng Zhang (Microsoft Research, Cambridge, UK)

Cheng Zhang is a principal researcher at Microsoft Research Cambridge, UK. She leads the Data Efficient Decision Making (Project Azua) team in Microsoft. Before joining Microsoft, she was with the statistical machine learning group of Disney Research Pittsburgh, located at Carnegie Mellon University. She received her Ph.D. from the KTH Royal Institute of Technology. She is interested in advancing machine learning methods, including variational inference, deep generative models, and sequential decision-making under uncertainty; and adapting machine learning to social impactful applications such as education and healthcare. She co-organized the Symposium on Advances in Approximate Bayesian Inference from 2017 to 2019.