The score function, which is the gradient of the log-density, provides a unique way to represent probability distributions. By working with distributions through score functions, researchers have been able to develop efficient tools for machine learning and statistics, collectively known as score-based methods.
Score-based methods have had a significant impact on vastly disjointed subfields of machine learning and statistics, such as generative modeling, Bayesian inference, hypothesis testing, control variates and Stein’s methods. For example, score-based generative models, or denoising diffusion models, have emerged as the state-of-the-art technique for generating high quality and diverse images. In addition, recent developments in Stein’s method and score-based approaches for stochastic differential equations (SDEs) have contributed to the developement of fast and robust Bayesian posterior inference in high dimensions. These have potential applications in engineering fields, where they could help improve simulation models.
At our workshop, we will bring together researchers from these various subfields to discuss the success of score-based methods, and identify common challenges across different research areas. We will also explore the potential for applying score-based methods to even more real-world applications, including in computer vision, signal processing, and computational chemistry. By doing so, we hope to folster collaboration among researchers and build a more cohesive research community focused on score-based methods.
Fri 6:50 a.m. - 7:00 a.m.
|
Introduction and opening remarks
(
Introduction
)
SlidesLive Video » |
🔗 |
Fri 7:00 a.m. - 7:30 a.m.
|
Invited talk: Karsten Kreis
(
Talk
)
SlidesLive Video » |
Karsten Kreis 🔗 |
Fri 7:30 a.m. - 8:00 a.m.
|
Invited Talk: Tommi Jaakkola
(
Talk
)
SlidesLive Video » |
Tommi Jaakkola 🔗 |
Fri 8:00 a.m. - 9:00 a.m.
|
Poster session 1
(
Poster
)
|
Yingzhen Li 🔗 |
Fri 9:00 a.m. - 10:00 a.m.
|
Panel discussion
(
Discussion Panel
)
SlidesLive Video » |
🔗 |
Fri 10:00 a.m. - 10:30 a.m.
|
Contributed talk session 1
(
talk
)
SlidesLive Video » |
🔗 |
Fri 10:30 a.m. - 11:30 a.m.
|
lunch break
|
🔗 |
Fri 11:30 a.m. - 12:00 p.m.
|
Invited Talk: Guan-Horng Liu
(
Talk
)
SlidesLive Video » |
Guan-Horng Liu 🔗 |
Fri 12:00 p.m. - 12:30 p.m.
|
Invited Talk: Tamara Fernandez
(
Talk
)
SlidesLive Video » |
Tamara Fernandez 🔗 |
Fri 12:30 p.m. - 1:00 p.m.
|
Contributed talk session 2
(
talk
)
SlidesLive Video » |
🔗 |
Fri 1:00 p.m. - 2:00 p.m.
|
Poster session 2
(
poster
)
|
🔗 |
Fri 2:00 p.m. - 2:30 p.m.
|
Invited Talk: Chenlin Meng
(
Talk
)
SlidesLive Video » |
Chenlin Meng 🔗 |
Fri 2:30 p.m. - 3:00 p.m.
|
Invited Talk: Mohammad Norouzi
(
Talk
)
SlidesLive Video » |
Mohammad Norouzi 🔗 |
-
|
Modeling Temporal Data as Continuous Functions with Process Diffusion
(
Poster
)
link »
Temporal data like time series are often observed at irregular intervals which is a challenging setting for the existing machine learning methods. To tackle this problem, we view such data as samples from some underlying continuous function. We then define a diffusion-based generative model that adds noise from a predefined stochastic process while preserving the continuity of the resulting underlying function.A neural network is trained to reverse this process which allows us to sample new realizations from the learned distribution. We define suitable stochastic processes as noise sources and introduce novel denoising and score-matching models on processes. Further, we show how to apply this approach to the multivariate probabilistic forecasting and imputation tasks. Through our extensive experiments, we demonstrate that our method outperforms previous models on synthetic and real-world datasets. |
Marin Biloš · Kashif Rasul · Anderson Schneider · Yuriy Nevmyvaka · Stephan Günnemann 🔗 |
-
|
Self-Guided Diffusion Model
(
Poster
)
link »
Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, such guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we aim to eliminate the need for such annotation by instead leveraging the flexibility of self-supervision signals to design a framework for self-guided diffusion models. By leveraging a feature extraction function and a self-annotation function, our method provides flexible guidance signals at various image granularities: from the level of holistic images to object boxes and even segmentation masks. |
TAO HU · David Zhang · Yuki Asano · Gertjan Burghouts · Cees Snoek 🔗 |
-
|
Locking and Quacking: Stacking Bayesian models predictions by log-pooling and superposition
(
Poster
)
link »
Combining predictive distributions is a central problem in Bayesian inference and machine learning. Currently, predictives are almost exclusively combined using linear density-mixtures such as Bayesian model averaging, Bayesian stacking, and mixture of experts.Nonetheless, linear mixtures impose traits that might be undesirable for some applications, such as multi-modality.While there are alternative strategies (e.g., geometric bridge or superposition), optimizing their parameters usually implies computing intractable normalizing constant repeatedly.In this extended abstract, we present two novel Bayesian model combination tools. They are generalizations of \emph{stacking}, but combine posterior densities by log-linear pooling (\emph{locking}) and quantum superposition (\emph{quacking}). To optimize model weights while avoiding the burden of normalizing constants, we maximize the Hyv\"arinen score of the combined posterior predictions. We demonstrate locking and quacking with an illustrative example. |
Yuling Yao · Luiz Carvalho · Diego Mesquita 🔗 |
-
|
Action Matching: A Variational Method for Learning Stochastic Dynamics from Samples
(
Poster
)
link »
Stochastic dynamics are ubiquitous in many fields of science, from the evolution of quantum systems in physics to diffusion-based models in machine learning. Existing methods such as score matching can be used to simulate these physical processes by assuming that the dynamics is a diffusion, which is not always the case. In this work, we propose a method called "Action Matching" that enables us to learn a much broader family of stochastic dynamics. Our method requires access only to samples from different time-steps, makes no explicit assumptions about the underlying dynamics, and can be applied even when samples are uncorrelated (i.e., are not part of a trajectory). Action Matching directly learns the underlying mechanism that moves samples in time without modeling the distributions at each time-step. In this work, we showcase how Action Matching can be used for generative modeling for computer vision tasks and discuss potential applications in other areas of science. |
Kirill Neklyudov · Daniel Severo · Alireza Makhzani 🔗 |
-
|
Likelihood Score under Generalized Self-Concordance
(
Poster
)
link »
We show how, under a generalized self-concordance assumption and possible model misspecification, we can establish non-asymptotic bounds on the normalized likelihood score when using maximum likelihood or score matching. The tail behavior is governed by an effective dimension corresponding to the trace of the sandwich covariance. We also show how our non-asymptotic approach allows us to obtain confidence set for the estimator and analyze Rao's score test. |
Lang Liu · Zaid Harchaoui 🔗 |
-
|
Noise-conditional Maximum Likelihood Estimation with Score-based Sampling
(
Poster
)
link »
We introduce a simple yet effective modification to the standard maximum likelihood estimation (MLE) framework for autoregressive generative models. Rather than maximizing a single unconditional likelihood of the data under the model, we maximize a family of \textit{noise-conditional} likelihoods consisting of the data perturbed by a continuum of noise levels. We find that models trained this way are more robust to noise, obtain higher test likelihoods, and generate higher quality images. They can also be sampled from via a novel score-based sampling scheme which combats the classical \textit{covariate shift} problem that occurs during sample generation in autoregressive models. Applying this augmentation to autoregressive image models, we obtain 3.32 bits per dimension on the ImageNet 64x64 dataset, and substantially improve the quality of generated samples in terms of the Frechet Inception distance (FID) --- from 37.50 to 13.50 on the CIFAR-10 dataset. |
Henry Li · Yuval Kluger 🔗 |
-
|
Journey to the BAOAB-limit: finding effective MCMC samplers for score-based models
(
Poster
)
link »
Diffusion and score-based generative models have achieved remarkable sample quality on difficult image synthesis tasks. Many works have proposed samplers for pretrained diffusion models, including ancestral samplers, SDE and ODE integrators and annealed MCMC approaches. So far, the best sample quality has been achieved with samplers that use time-conditional score functions and move between several noise levels. However, estimating an accurate score function at many noise levels can be challenging and requires an architecture that is more expressive than would be needed for a single noise level. In this work, we explore MCMC sampling algorithms that operate at a single noise level, yet synthesize images with acceptable sample quality on the CIFAR-10 dataset. We show that while naive application of Langevin dynamics and a related noise-denoise sampler produces poor samples, methods built on integrators of underdamped Langevin dynamics using splitting methods can perform well. Further, by combining MCMC methods with existing multiscale samplers, we begin to approach competitive sample quality without using scores at large noise levels. |
Ajay Jain · Ben Poole 🔗 |
-
|
First hitting diffusion models
(
Poster
)
link »
We propose a family of First Hitting Diffusion Models (FHDM), deep generative models that generate data with a diffusion process that terminates at a random first hitting time. This yields an extension of the standard fixed-time diffusion models that terminate at a pre-specified deterministic time. Although standard diffusion models are designed for continuous unconstrained data, FHDM is naturally designed to learn distributions on continuous as well as a range of discrete and structure domains. |
Mao Ye · Lemeng Wu · Qiang Liu 🔗 |
-
|
Score-based generative model learnmanifold-like structures with constrained mixing
(
Poster
)
link »
How do score-based generative models (SBMs) learn the data distribution supported on a lower-dimensional manifold? We investigate the score model of a trained SBM through its linear approximations and subspaces spanned by local feature vectors. During diffusion as the noise decreases, the local dimensionality increases and become more varied between different sample sequences. Importantly, we find that the learned vector field mixes images by a non-conservative field within the manifold, although it denoises with normal projections as if there is a potential function in off-manifold directions. At each noise level, the subspace spanned by the local features overlap with an effective density function. These observations suggest that SBMs can flexibly mix samples with the learned score field while carefully maintaining a manifold-like structure of the data distribution. |
Li Kevin Wenliang · Ben Moran 🔗 |
-
|
Exploring the Design Space of Generative Diffusion Processes for Sparse Graphs
(
Poster
)
link »
We extend score-based generative diffusion processes (GDPs) to sparse graphs and other inherently discrete data, with a focus on scalability. GDPs apply diffusion to training samples, then learn a reverse process generating new samples out of noise. Previous work applying GDPs to discrete data effectively relax discrete variables to continuous ones.Our approach is different: we consider jump diffusion (i.e., diffusion with punctual discontinuities) in $\mathbb{R}^d \times \mathcal{G}$ where $\mathcal{G}$ models discrete components of the data. We focus our attention on sparse graphs: our \textsc{Dissolve} process gradually breaks apart a graph $(V,E) \in \mathcal{G}$ in a certain number of distinct jump events. This confers significant advantages compared to GDPs that use less efficient representations and/or that destroy the graph information in a sudden manner. Gaussian kernels allow for efficient training with denoising score matching; standard GDP methods can be adapted with just an extra argument to the score function. We consider improvement opportunities for \textsc{Dissolve} and discuss necessary conditions to generalize to other kinds of inherently discrete data.
|
Pierre-André Noël · Pau Rodriguez 🔗 |
-
|
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
(
Poster
)
link »
Model-based reinforcement learning (MBRL) exhibits favorable performance in practice, but its theoretical guarantees are mostly restricted to the setting when the transition model is Gaussian or Lipschitz and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a \emph{Bayesian coreset} of only statistically significant past state-action pairs; and (iii) {exhibits a Bayesian regret of $\mathcal{O}(dH^{1+({\alpha}/{2})}T^{1-({\alpha}/{2})})$ with coreset size of $\Omega(\sqrt{T^{1+\alpha}})$, where $d$ is the aggregate dimension of state action space, $H$ is the episode length, $T$ is the total number of time steps experienced, and $\alpha\in (0,1]$ is the tuning parameter which is a novel introduction into the analysis of MBRL in this work}. To achieve these results, we adopt an approach based upon Stein's method, which allows distributional distance to be evaluated in closed form as the kernelized Stein discrepancy (KSD). Experimentally, we observe that this approach is competitive with several state-of-the-art RL methodologies, and can achieve up to $50\%$ reduction in wall clock time in some continuous control environments.
|
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Pratap Tokekar · Dinesh Manocha 🔗 |
-
|
Unsupervised Controllable Generation with Score-based Diffusion Models: Disentangled Latent Code Guidance
(
Poster
)
link »
From the impressive empirical success of Score-based diffusion models, it is recently spotlighted in generative models. In real-world applications, the controllable generation enriches the impact of diffusion models. This paper aims to solve the challenge by presenting the method of control in an unsupervised manner. We propose the Latent Code Guidance Diffusion Model (LCG-DM), which is the first approach to apply disentanglement on Score-based diffusion models. Disentangled latent code can be considered as a pseudo-label, since it separately expresses semantic information in each dimension. LCG-DM is a Score-based diffusion model that reflects disentangled latent code as the condition. LCG-DM shows the best performance among baselines in terms of both sample quality and disentanglement on dSprites dataset. LCG-DM can manipulate images on CelebA dataset, with comparable FID performance compared to non-disentangling Score-based diffusion models. Furthermore, we provide experimental results of scaling method that reflects more on pseudo-label with MNIST dataset. |
Yeongmin Kim · Dongjun Kim · Hyeonmin Lee · Il-chul Moon 🔗 |
-
|
Molecular Docking with Diffusion Generative Models
(
Oral
)
link »
Predicting the binding structure of a small molecule to a protein-a task known as molecular docking-is critical to drug design. Recent deep learning methods that frame docking as a regression problem have yet to offer substantial improvements over traditional search-based methods. We identify the drawbacks of a regression-based approach and instead view molecular docking as a generative modeling problem. We develop DockDiff, a novel diffusion process and generative model over the main degrees of freedom involved during docking. Empirically, DockDiff obtains a 37% top-1 success rate (RMSD <2A) on PDBBind, significantly outperforming the previous state-of-the-art of traditional docking (23%) and deep learning (20%) methods. |
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola 🔗 |
-
|
Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models
(
Poster
)
link »
Generative models based on denoising diffusion techniques have led to an unprecedented increase in the quality and diversity of imagery that is now possible to create with neural generative models. However, most contemporary state-of-the-art methods are derived from a standard isotropic Gaussian formulation. In this work we examine the situation where non-isotropic Gaussian distributions are used. We present the key mathematical derivations for creating denoising diffusion models using an underlying non-isotropic Gaussian noise model. We also provide initial experiments to help verify empirically that this more general modelling approach can also yield high-quality samples. |
Vikram Voleti · Chris Pal · Adam Oberman 🔗 |
-
|
Statistical Efficiency of Score Matching: The View from Isoperimetry
(
Oral
)
link »
Deep generative models parametrized up to a normalizing constant (e.g. energy-based models) are difficult to train by maximizing the likelihood of the data because the likelihood and/or gradients thereof cannot be explicitly or efficiently written down. Score matching is a training method, whereby instead of fitting the likelihood $\log p(x)$ for the training data, we instead fit the score function $\nabla_x \log p(x)$ --- obviating the need to evaluate the partition function. Though this estimator is known to be consistent, its unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood --- which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper, and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated --- i.e. the Poincar\'e, log-Sobolev and isoperimetric constant --- quantities which govern the mixing time of Markov processes like Langevin dynamics. Roughly, we show thatthe score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant. Conversely, if the distribution has a large isoperimetric constant --- even for simple families of distributions like exponential families with rich enough sufficient statistics --- score matching will be substantially less efficient than maximum likelihood. We suitably formalize these results both in the finite sample regime, and in the asymptotic regime. Finally, we identify a direct parallel in the discrete setting, where we connect the statistical properties of pseudolikelihood estimation with approximate tensorization of entropy and the Glauber dynamics.
|
Frederic Koehler · Alexander Heckett · Andrej Risteski 🔗 |
-
|
Score-Based Generative Models with Lévy Processes
(
Poster
)
link »
Time reversibility of stochastic processes is a primary cornerstone of the score-based generative models through stochastic differential equations (SDEs). While a broader class of Markov processes is reversible, previous continuous-time approaches restrict the range of noise processes to Brownian motion (BM) since the closed-form of the time reversal formula is only known for diffusion processes. In this paper, to expand the class of noise distribution, we propose a class of score-based probabilistic generative models, Lévy-Itō Model (LIM), which utilizes $\alpha$-stable distribution for noise injection. To this end, we derive an approximate time reversal formula for the SDEs with Lévy processes that can allow discontinuous pure jump motion.Consequently, we advance the score-based generative models with a broad range of non-Gaussian Markov processes.Empirical results on MNIST, CIFAR-10, CelebA, and CelebA-HQ show that our approach is valid.
|
Eunbi Yoon · Keehun Park · Jinhyeok Kim · Sungbin Lim 🔗 |
-
|
Fast Sampling of Diffusion Models via Operator Learning
(
Poster
)
link »
Diffusion models have found widespread adoption in various areas. However, sampling from them is still slow because it involves emulating a reverse stochastic process with hundreds-to-thousands of neural network evaluations. Inspired by the recent success of neural operators in accelerating differential equations solving, we approach this problem by solving the underlying neural differential equation from an operator learning perspective. We examine probability flow ODE trajectories in diffusion model and observe a compact energy spectrum that can be learned efficiently in Fourier space. With this insight, we propose diffusion Fourier neural operator (DFNO) with temporal convolution in Fourier space to parameterize the operator that maps initial condition to the solution trajectory. DFNO can apply to any diffusion models and generate high-quality samples in one step. Our method achieves the state-of-the-art clean FID of 5.9 (legacy FID 4.72) on CIFAR-10 using one network evaluation. |
Hongkai Zheng · Weili Nie · Arash Vahdat · Kamyar Azizzadenesheli · Anima Anandkumar 🔗 |
-
|
Score Modeling for Simulation-based Inference
(
Poster
)
link »
Neural Posterior Estimation methods for simulation-based inference can be ill-suited for dealing with posterior distributions obtained by conditioning on multiple observations, as they may require a large number of simulator calls to yield accurate approximations. Neural Likelihood Estimation methods can naturally handle multiple observations, but require a separate inference step, which may affect their efficiency and performance. We introduce a new method for simulation-based inference that enjoys the benefits of both approaches. We propose to model the scores for the posterior distributions induced by individual observations, and introduce a sampling algorithm that combines the learned scores to approximately sample from the target efficiently. |
Tomas Geffner · George Papamakarios · Andriy Mnih 🔗 |
-
|
Improving Conditional Score-Based Generation with Calibrated Classification and Joint Training
(
Poster
)
link »
Score-based Generative Model (SGM) is a popular family of deep generative models that can achieve leading image generation quality. Earlier works have extended SGMs to tackle class-conditional generation with the guidance of well-trained classifiers. Nevertheless, we find that the classifier-guided SGMs actually do not achieve accurate conditional generation when evaluated with class-conditional measures. We argue that the lack of control roots from inaccurate gradients within the classifiers. We then propose to improve classifier-guided SGMs by calibrating classifiers using principles from energy-based models. In addition, we design a joint-training architecture to further enhance the conditional generation performance. Empirical results on CIFAR-10 demonstrate that the proposed model improves the conditional generation accuracy significantly while maintaining similar generation quality. The results support the potential of memory-efficient SGMs for conditional generation based on classifier guidance. |
Paul K. Huang · Si-An Chen · Hsuan-Tien Lin 🔗 |
-
|
Dimension reduction via score ratio matching
(
Poster
)
link »
We propose a method to detect a low-dimensional subspace where a non-Gaussian target distribution departs from a known reference distribution (e.g., a standard Gaussian). We identify this subspace from gradients of the log-ratio between the target and reference densities, which we call the score ratio. Given only samples from the target distribution, we estimate these gradients via score ratio matching, with a tailored parameterization and a regularization method that expose the low-dimensional structure we seek. We show that our approach outperforms standard score matching for dimension reduction of in-class distributions, and that several benchmark UCI datasets in fact exhibit this type of low dimensionality. |
Michael Brennan · Ricardo Baptista · Youssef Marzouk 🔗 |
-
|
Spectral Diffusion Processes
(
Poster
)
link »
Score-based generative modelling (SGM) has proven to be a very effective method for modelling densities on finite-dimensional spaces. In this work we propose to extend this methodology to learn generative models over functional spaces. To do so, we represent functional data in spectral space to dissociate the stochastic part of the processes from their space-time part. Using dimensionality reduction techniques we then sample from their stochastic component using finite dimensional SGM. We demonstrate our method’s effectiveness for modelling various multimodal datasets. |
Angus Phillips · Thomas Seror · Michael Hutchinson · Valentin De Bortoli · Arnaud Doucet · Emile Mathieu 🔗 |
-
|
Convergence of score-based generative modeling for general data distributions
(
Poster
)
link »
We give polynomial convergence guarantees for denoising diffusion models that do not rely on the data distribution satisfying functional inequalities or strong smoothness assumptions. Assuming a $L^2$-accurate score estimate, we obtain Wasserstein distance guarantees for any distributions of bounded support or sufficiently decaying tails, as well as TV guarantees for distributions with further smoothness assumptions.
|
Holden Lee · Jianfeng Lu · Yixin Tan 🔗 |
-
|
A generic diffusion-based approach for 3D human pose prediction in the wild
(
Poster
)
link »
3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging spatio-temporal task. It can be more challenging in real-world applications where occlusions will inevitably happen, and estimated 3D coordinates of joints would contain some noise.We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise, and propose a conditional diffusion model that denoises them and forecasts plausible poses. Instead of naively predicting all future frames at once, our model consists of two cascaded sub-models, each specialized for modeling short and long horizon distributions.We also propose a repairing step to improve the performance of any 3D pose forecasting model in the wild, by leveraging our diffusion model to repair the inputs. We investigate our findings on several datasets, and obtain significant improvements over the state of the art. The code will be made available online. |
Saeed Saadatnejad · Ali Rasekh · Mohammadreza Mofayezi · Yasamin Medghalchi · Sara Rajabzadeh · Taylor Mordan · Alexandre Alahi 🔗 |
-
|
Diffusion Models for Video Prediction and Infilling
(
Poster
)
link »
Video prediction and infilling require strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain.We present Random-Mask Video Diffusion (RaMViD), which extends image diffusion models to videos using 3D convolutions, and introduces a new conditioning technique during training.By varying the mask we condition on, the model is able to perform video prediction, infilling, and upsampling. Due to our simple conditioning scheme, we can utilize the same architecture as used for unconditional training, which allows us to train the model in a conditional and unconditional fashion at the same time. We evaluate the model on two benchmark datasets for video prediction, on which we achieve state-of-the-art results, and one for video generation. |
Tobias Höppe · Arash Mehrjou · Stefan Bauer · Didrik Nielsen · Andrea Dittadi 🔗 |
-
|
Discovering the Hidden Vocabulary of DALLE-2
(
Poster
)
link »
We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that |
Giannis Daras · Alex Dimakis 🔗 |
-
|
Multiresolution Textual Inversion
(
Oral
)
link »
We extend Textual Inversion to learn pseudo-words that represent a concept at different resolutions. This allows us to generate images that use the concept at different resolutions and also to manipulate different resolutions using language.Once learned, the user can generate images that agree with the original concept at different levels of detail; ``A photo of $S^*(0)$'' produces the exact object while the prompt ``A photo of $S^*(0.8)$'' only matches the rough outlines and colors. Our framework allows us to generate images that use different resolutions of an image (e.g. details, textures, styles) as separate pseudo-words that can be composed in various ways.
|
Giannis Daras · Alex Dimakis 🔗 |
-
|
Neural Volumetric Mesh Generator
(
Poster
)
link »
Deep generative models have shown success in generating 3D shapes with different representations. In this work, we propose Neural Volumetric Mesh Generator (NVMG), which can generate novel and high-quality volumetric meshes. Unlike the previous 3D generative model for point cloud, voxel, and implicit surface, volumetric mesh is a ready-to-use representation in industry with details on both the surface and interior. Generating this kind of highly-structured data thus brings a great challenge. To tackle this problem, we first propose to use a diffusion-based generative model to generate voxelized shapes with realistic shape and topology information. With the voxelized shape, we can simply obtain a tetrahedral mesh as a template. Further, we use a voxel-conditional neural network to predict the surface conditioned on the voxels, and progressively project the tetrahedral mesh to the predicted surface under regularization. As shown in the experiments, without any post-processing, our pipeline can generate high-quality artifact-free volumetric and surface meshes. |
Yan Zheng · Lemeng Wu · Xingchao Liu · Zhen Chen · Qiang Liu · Qixing Huang 🔗 |
-
|
Targeted Separation and Convergence with Kernel Discrepancies
(
Oral
)
link »
Kernel Stein discrepancies (KSDs) are maximum mean discrepancies (MMDs) that leverage the score information of distributions, andhave grown central to a wide range of applications. In most settings, these MMDs are required to $(i)$ separate a target $\mathrm{P}$ from other probability measures or even $(ii)$ control weak convergence to $\mathrm{P}$. In this article we derive new sufficient and necessary conditions that substantially broaden the known conditions for KSD separation and convergence control, and develop the first KSDs known to metrize weak convergence to $\mathrm{P}$. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent.
|
Alessandro Barp · Carl-Johann Simon-Gabriel · Mark Girolami · Lester Mackey 🔗 |
-
|
Using Perturbation to Improve Goodness-of-Fit Tests based on Kernelized Stein Discrepancy
(
Poster
)
link »
Kernelized Stein discrepancy (KSD) is a score-based discrepancy widely employed in goodness-of-fit tests. It can be used even when the target distribution has an unknown normalising factor, such as in Bayesian analysis. We show theoretically and empirically that the power of the KSD test can be low when the target distribution has well-separated modes, which is due to insufficient data in regions where the score functions of the alternative and the target distributions differ the most. To improve its test power, we propose to perturb the target and alternative distributions before applying the KSD test. The perturbation uses a Markov transition kernel that leaves the target invariant but perturbs alternatives. We provide numerical evidence that the proposed approach can lead to a substantially higher power than the KSD test when the target and the alternative are mixture distributions that differ only in mixing weights. |
Xing Liu · Andrew Duncan · Axel Gandy 🔗 |
-
|
Regularizing Score-based Models with Score Fokker-Planck Equations
(
Poster
)
link »
Score-based generative models learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These pertubed data densities are tied together by the Fokker-Planck equation (FPE), a PDE governing the spatial-temporal evolution of a density undergoing a diffusion process. In this work, we derive a corresponding equation characterizing the noise-conditional scores of the perturbed data densities (i.e., their gradients), termed the score FPE. Surprisingly, despite impressive empirical performance, we observe that scores learned via denoising score matching (DSM) do not satisfy the underlying score FPE. We mathematically analyze two implications of satisfying the score FPE and a potential explanation for why the score FPE is not satisfied in practice. At last, we propose to regularize the DSM objective to enforce satisfaction of the score FPE, and show its effectiveness on synthetic data and MNIST. |
Chieh-Hsin Lai · Yuhta Takida · Naoki Murata · Toshimitsu Uesaka · Yuki Mitsufuji · Stefano Ermon 🔗 |
-
|
Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
(
Poster
)
link »
We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL-E 2. Our main result is that, assuming accurate score estimates, such SGMs can efficiently sample from essentially any realistic data distribution. In contrast to prior works, our results (1) hold for an $L^2$-accurate score estimate (rather than $L^\infty$-accurate); (2) do not require restrictive functional inequality conditions that preclude substantial non-log-concavity; (3) scale polynomially in all relevant problem parameters; and (4) match state-of-the-art complexity guarantees for discretization of the Langevin diffusion, provided that the score error is sufficiently small. We view this as strong theoretical justification for the empirical success of SGMs. We also examine SGMs based on the critically damped Langevin diffusion (CLD). Contrary to conventional wisdom, we provide evidence that the use of the CLD does *not* reduce the complexity of SGMs.
|
Sitan Chen · Sinho Chewi · Jerry Li · Yuanzhi Li · Adil Salim · Anru Zhang 🔗 |
-
|
Fast Sampling of Diffusion Models with Exponential Integrator
(
Poster
)
link »
Our goal is to develop a fast sampling method for Diffusion models~(DMs) with a small number of steps while retaining high sample quality. To achieve this, we systematically analyze the sampling procedure in DMs and identify key factors that affect the sample quality, among which the method of discretization is most crucial. By carefully examining the learned diffusion process, we propose Diffusion Exponential Integrator Sampler~(DEIS). It is based on the Exponential Integrator designed for discretizing ordinary differential equations (ODEs) and leverages a semilinear structure of the learned diffusion process to reduce the discretization error. The proposed method can be applied to any DMs and can generate high-fidelity samples in as few as 10 steps.By directly using pre-trained DMs, we achieve superior sampling performance when the number of score function evaluation~(NFE) is limited, e.g., 4.17 FID with 10 NFEs, 2.86 FID with only 20 NFEs on CIFAR10. |
Qinsheng Zhang · Yongxin Chen 🔗 |
-
|
On Distillation of Guided Diffusion Models
(
Oral
)
link »
Classifier-free guided diffusion models have recently been shown to be highly effective at high-resolution image generation, and they have been widely used in large-scale diffusion frameworks including DALL$\cdot$E 2, GLIDE and Imagen. However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. On ImageNet 64x64 and CIFAR-10, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from.
|
Chenlin Meng · Ruiqi Gao · Diederik Kingma · Stefano Ermon · Jonathan Ho · Tim Salimans 🔗 |
-
|
An optimal control perspective on diffusion-based generative modeling
(
Oral
)
link »
We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs) such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the well-known verification theorem from control theory. Further, we develop a novel diffusion-based method for sampling from unnormalized densities -- a problem frequently occurring in statistics and computational sciences. |
Julius Berner · Lorenz Richter · Karen Ullrich 🔗 |
-
|
Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation
(
Poster
)
link »
Planning a motion in a cluttered environment is a recurring task autonomous agents need to solve. This paper presents a first attempt to learn generative models for collision-free trajectory generation based on conditioned score-based models. Given multiple navigation tasks, environment maps and collision-free trajectories pre-computed with a sample-based planner, using a signed distance function loss we learn a vision encoder of the map and use its embedding to learn a conditioned score-based model for trajectory generation. A novelty of our method is to integrate in a temporal U-net architecture, using a cross-attention mechanism, conditioning variables such as the latent representation of the environment and task features. We validate our approach in a simulated 2D planar navigation toy task, where a robot needs to plan a path that avoids obstacles in a scene. |
Joao Carvalho · Mark Baierl · Julen Urain · Jan Peters 🔗 |
-
|
Convergence in KL and Rényi Divergence of the Unadjusted Langevin Algorithm Using Estimated Score
(
Poster
)
link »
We study the Unadjusted Langevin Algorithm (ULA) for sampling using an estimated score function when the target distribution satisfies log-Sobolev inequality (LSI), motivated by Score-based Generative Modeling (SGM). We prove convergence in Kullback-Leibler (KL) divergence under a minimal sufficient assumption on the error of score estimator called bounded Moment Generating Function (MGF) assumption. Our assumption is weaker than the previous assumption which requires finite $L^\infty$ norm of the error. Under the $L^\infty$ error assumption, we also prove convergence in R\'enyi divergence, which is stronger than KL divergence. On the other hand, under $L^p$ error assumption for any $1 \leq p < \infty$ which is weaker than bounded MGF assumption, we show that the stationary distribution of Langevin dynamics with a $L^p$-accurate score estimator can be arbitrarily far away from the desired distribution. Thus having a $L^p$-accurate score estimator cannot guarantee convergence. Our results suggest controlling mean squared error which is the form of commonly used loss function when using neural network to estimate score function is not enough to guarantee the upstream algorithm will converge, hence in order to get a theoretical guarantee we need a stronger control over the error in score matching. Despite requiring an exponentially decaying error probability, we give an example to demonstrate the bounded MGF assumption is achievable when using Kernel Density Estimation (KDE)-based score estimator.
|
Kaylee Y. Yang · Andre Wibisono 🔗 |
-
|
On RKHS Choices for Assessing Graph Generators via Kernel Stein Statistics
(
Poster
)
link »
Score-based kernelised Stein discrepancy (KSD) tests have emerged as a powerful tool for the goodness of fit tests, especially in high dimensions; however, the test performance may depend on the choice of kernels in an underlying reproducing kernel Hilbert space (RKHS). Here we assess the effect of RKHS choice for KSD tests of random networks models, developed for exponential random graph models (ERGMs) in Xu and Reinert (2021) and for synthetic graph generators in Xu and Reinert (2022). We investigate the power performance and the computational runtime of the test in different scenarios, including both dense and sparse graph regimes. Experimental results on kernel performance for model assessment tasks are shown and discussed on synthetic and real-world network applications. |
Wenkai Xu · Gesine D Reinert · Moritz Weckbecker 🔗 |
-
|
Proposal of a Score Based Approach to Sampling Using Monte Carlo Estimation of Score and Oracle Access to Target Density
(
Poster
)
link »
Score based approaches to sampling have shown much success as a generative algorithm to produce new samples from a target density given a pool of initial samples. In this work, we consider if we have no initial samples from the target density, but rather $0^{th}$ and $1^{st}$ order oracle access to the log likelihood. Such problems may arise in Bayesian posterior sampling, or in training a network from data. Using this knowledge alone, we propose a Monte Carlo method to estimate the score empirically as a particular expectation of a random variable. Using this estimator, we can then run a discrete version of the backward flow SDE to produce samples from the target density. This approach has the benefit of not relying on a pool of initial samples from the target density, and it does not rely on a neural network or other black box model to estimate the score.
|
Curtis McDonald · Andrew Barron 🔗 |
-
|
Diffusion Prior for Online Decision Making: A Case Study of Thompson Sampling
(
Poster
)
link »
In this work, we investigate the possibility of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new task at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. Preliminary experiments clearly demonstrate the potential of the considered approach. |
Yu-Guan Hsieh · Shiva Kasiviswanathan · Branislav Kveton · Patrick Blöbaum 🔗 |
-
|
Scalable Causal Discovery with Score Matching
(
Poster
)
link »
This paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function $\nabla \operatorname{log}p(\mathbf{X})$, we extend the work of Rolland et al., 2022, that only recovers the topological order from the score and requires an expensive pruning step to discover the edges.Our analysis leads to DAS, a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar.
|
Francesco Montagna · Nicoletta Noceti · Lorenzo Rosasco · Kun Zhang · Francesco Locatello 🔗 |
-
|
All are Worth Words: a ViT Backbone for Score-based Diffusion Models
(
Poster
)
link »
Vision transformers (ViT) have shown promise in various vision tasks including low-level ones while the U-Net remains dominant in score-based diffusion models. In this paper, we perform a systematical empirical study on the ViT-based architectures in diffusion models. Our results suggest that adding extra long skip connections (like the U-Net) to ViT is crucial to diffusion models. The new ViT architecture, together with other improvements, is referred to as U-ViT. On several popular visual datasets, U-ViT achieves competitive generation results to SOTA U-Net while requiring comparable amount of parameters and computation if not less. |
Fan Bao · Chongxuan LI · Yue Cao · Jun Zhu 🔗 |
-
|
Fine-tuning Diffusion Models with Limited Data
(
Poster
)
link »
Diffusion models have recently shown remarkable progress, demonstrating state-of-the-art image generation qualities. Like the other high-fidelity generative models, diffusion models require a large amount of data and computing time for stable training, which hinders the application of diffusion models for limited data settings. To overcome this issue, one can employ a pre-trained diffusion model built on a large-scale dataset and fine-tune it on a target dataset. Unfortunately, as we show empirically, this easily results in overfitting. In this paper, we propose an efficient fine-tuning algorithm for diffusion models that can efficiently and robustly train on limited data settings. We first show that fine-tuning only the small subset of the pre-trained parameters can efficiently learn the target dataset with much less overfitting. Then we further introduce a lightweight adapter module that can be attached to the pre-trained model with minimal overhead and show that fine-tuning with our adapter module significantly improves the image generation quality. We demonstrate the effectiveness of our method on various real-world image datasets. |
Taehong Moon · Moonseok Choi · Gayoung Lee · Jung-Woo Ha · Juho Lee 🔗 |
-
|
JPEG Artifact Correction using Denoising Diffusion Restoration Models
(
Poster
)
link »
Diffusion models can be used as learned priors for solving various inverse problems. However, most existing approaches are restricted to linear inverse problems, limiting their applicability to more general cases. In this paper, we build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some non-linear inverse problems. We leverage the pseudo-inverse operator used in DDRM and generalize this concept for other measurement operators, which allows us to use pre-trained unconditional diffusion models for applications such as JPEG artifact correction. We empirically demonstrate the effectiveness of our approach across various quality factors, attaining performance levels that are on par with state-of-the-art methods trained specifically for the JPEG restoration task. |
Bahjat Kawar · Jiaming Song · Stefano Ermon · Michael Elad 🔗 |
-
|
Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation
(
Poster
)
link »
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modelling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialised hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the framework effectiveness. The proposed method, in particular, still generates high-quality molecular graphs in a limited number of steps. |
Han Huang · Leilei Sun · Bowen Du · Weifeng Lv 🔗 |
-
|
Denoising Diffusion for Sampling SAT Solutions
(
Poster
)
link »
Generating diverse solutions to the Boolean Satisfiability Problem (SAT) is a hard computational problem with practical applications for testing and functional verification of software and hardware designs. We explore the way to generate such solutions using Denoising Diffusion coupled with a Graph Neural Network to implement the denoising function. We find that the obtained accuracy is similar to the currently best purely neural method and the produced SAT solutions are highly diverse even if the system is trained with non-random solutions from a standard solver. |
Karlis Freivalds · Sergejs Kozlovičs 🔗 |
-
|
When are equilibrium networks scoring algorithms?
(
Poster
)
link »
Principal Component Analysis (PCA) and its exponential family extensions have three components: observed variables, latent variables and parameters of a linear transformation. The likelihood of the observation is an exponential family with canonical parameters that are a linear transformation of the latent variables. We show how joint maximum a-posteriori (MAP) estimates can be computed using a deep equilibrium model that computes roots of the score function. Our analysis provides a systematic way to relate neural network activation functions back to statistical assumptions about the observations. Our layers are implicitly differentiable, and can be fine-tuned in downstream tasks, as demonstrated on a synthetic task. |
Russell Tsuchida · Cheng Soon Ong 🔗 |
-
|
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
(
Poster
)
link »
We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions $\pi_0$ and $\pi_1$, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from $\pi_0$ and $\pi_1$ as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of $\pi_0$ and $\pi_1$ to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation and image-to-image translation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with \emph{a single Euler discretization step}.
|
Xingchao Liu · Chengyue Gong · Qiang Liu 🔗 |
-
|
Let us Build Bridges: Understanding and Extending Diffusion Generative Models
(
Poster
)
link »
Diffusion-based generative models have achieved promising results recently, but raise an array of open questions in terms of conceptual understanding, theoretical analysis, algorithm improvement and extensions to discrete, structured, non-Euclidean domains. This work tries to re-exam the overall framework, in order to gain better theoretical understandings and develop algorithmic extensions for data from arbitrary domains. By viewing diffusion models as latent variable models with unobserved diffusion trajectories and applying maximum likelihood estimation (MLE) with latent trajectories imputed from an auxiliary distribution, we show that both the model construction and the imputation of latent trajectories amount to constructing diffusion bridge processes that achieve deterministic values and constraints at end point, for which we provide a systematic study and a suit of tools. Leveraging our framework, we present a simple and unified approach to learning on data from different discrete and constrained domains. Experiments show that our methods perform superbly on generating images and semantic segments. |
Xingchao Liu · Lemeng Wu · Mao Ye · Qiang Liu 🔗 |
-
|
Improved Marginal Unbiased Score Expansion (MUSE) via Implicit Differentiation
(
Poster
)
link »
We apply the technique of implicit differentiation to boost performance, reduce numerical error, and remove required user-tuning in the Marginal Unbiased Score Expansion (MUSE) algorithm for hierarchical Bayesian inference. We demonstrate these improvements on three representative inference problems: 1) an extended Neal's funnel 2) Bayesian neural networks, and 3) probabilistic principal component analysis. On our particular test cases, MUSE with implicit differentiation is faster than Hamiltonian Monte Carlo by factors of 155, 397, and 5, respectively, or factors of 65, 278, and 1 without implicit differentiation, and yields good approximate marginal posteriors. The Julia and Python MUSE packages have been updated to use implicit differentiation, and can solve problems defined by hand or with any of a number of popular probabilistic programming languages and automatic differentiation backends. |
Marius Millea 🔗 |
-
|
Few-Shot Diffusion Models
(
Poster
)
link »
Denoising diffusion probabilistic models (DDPM) are powerful hierarchical latent variable models with remarkable sample generation quality and training stability. These properties can be attributed to parameter sharing in the generative hierarchy, as well as a parameter-free diffusion-based inference procedure. In this paper, we present Few-Shot Diffusion Models (FSDM), a framework for few-shot generation leveraging conditional DDPMs. FSDMs are trained to adapt the generative process conditioned on a small set of images from a given class by aggregating image patch information using a set-based Vision Transformer (ViT). At test time, the model is able to generate samples from previously unseen classes conditioned on as few as 5 samples from that class. We empirically show that FSDM can perform few-shot generation and transfer to new datasets. We benchmark variants of our method on complex vision datasets for few-shot learning and compare to unconditional and conditional DDPM baselines. Additionally, we show how conditioning the model on patch-based input set information improves training convergence. |
Giorgio Giannone · Didrik Nielsen · Ole Winther 🔗 |
-
|
Batch Denoising via Blahut-Arimoto
(
Poster
)
link »
In this work, we propose to solve batch denoising using Blahut-Arimoto algorithm (BA). Batch denoising via BA (BDBA), similar to Deep Image Prior (DIP), is based on an untrained score-based generative model. Theoretical results show that ourdenoising estimation is highly likely to be close to the best result. Experimentally,we show that BDBA outperforms DIP significantly. |
Qing Li · Cyril Guyot 🔗 |
-
|
Why Are Conditional Generative Models Better Than Unconditional Ones?
(
Poster
)
link »
Extensive empirical evidence demonstrates that conditional generative models are easier to train and perform better than unconditional ones by exploiting the labels of data. So do score-based diffusion models. In this paper, we analyze the phenomenon formally and identify that the key of conditional learning is to partition the data properly. Inspired by the analyses, we propose self-conditioned diffusion models (SCDM), which is trained conditioned on indices clustered by the $k$-means algorithm on the features extracted by a model pre-trained in a self-supervised manner. SCDM significantly improves the unconditional model across various datasets and achieves a record-breaking FID of 3.94 on ImageNet 64x64 without labels. Besides, SCDM achieves a slightly better FID than the corresponding conditional model on CIFAR10.
|
Fan Bao · Chongxuan LI · Jiacheng Sun · Jun Zhu 🔗 |
-
|
Particle-based Variational Inference with Preconditioned Functional Gradient Flow
(
Poster
)
link »
Particle-based variational inference (VI) minimizes the KL divergence between model samples and the target posterior with gradient flow estimates. With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms have been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow. However, the requirement of RKHS restricts the function class and algorithmic flexibility. This paper remedies the problem by proposing a general framework to obtain tractable functional gradient flow estimates. The functional gradient flow in our framework can be defined by a general functional regularization term that includes the RKHS norm as a special case. We also use our framework to propose a new particle-based VI algorithm: \emph{preconditioned functional gradient flow} (PFG). Compared with SVGD, the proposed preconditioned functional gradient method has several advantages: larger function classes; greater scalability in the large particle-size scenarios; better adaptation to ill-conditioned target distribution; provable continuous-time convergence in KL divergence. Both theoretical and experiments have shown the effectiveness of our framework. |
Hanze Dong · Xi Wang · Yong Lin · Tong Zhang 🔗 |
-
|
Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"
(
Poster
)
link »
Recent text-to-image diffusion models trained on large-scale data achieve remarkable performance on text-conditioned image synthesis (e.g., GLIDE, DALL∙E 2, Imagen, Stable Diffusion). This paper presents an embarrassingly simple method to use these text-to-image diffusion models as zero-shot image-to-image editors. Our method, CycleDiffusion, is based on a recent finding that, when the "random seed" is fixed, sampling from two diffusion model distributions will produce images with minimal differences, and the core of our idea is to infer the "random seed" that is likely to produce a source image conditioned on a source text. We formalize the "random seed" as a sequence of isometric Gaussian noises that we reformulate as diffusion models' latent code. Using the "random seed" inferred from the source text-image pair, we generate a target image conditioned a target text. Experiments show that CycleDiffusion can minimally edit the image in a zero-shot manner. |
Chen Henry Wu · Fernando D De la Torre 🔗 |
-
|
Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis
(
Poster
)
link »
Recently, diffusion models have shown remarkable results in image synthesis by gradually removing noise and amplifying signals. Although the simple generative process surprisingly works well, is this the best way to generate image data? For instance, despite the fact that human perception is more sensitive to the low-frequencies of an image, diffusion models themselves do not consider any relative importance of each frequency component. Therefore, to incorporate the inductive bias for image data, we propose a novel generative process that synthesizes images in a coarse-to-fine manner. First, we generalize the standard diffusion models by enabling diffusion in a rotated coordinate system with different velocities for each component of the vector. We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds. Specifically, the proposed blur diffusion consists of a forward process that blurs an image and adds noise gradually, after which a corresponding reverse process deblurs an image and removes noise progressively. Experiments show that proposed model outperforms the previous method in FID on LSUN bedroom and church datasets. |
Sangyun Lee · Hyungjin Chung · Jaehyeon Kim · Jong Chul Ye 🔗 |
-
|
Towards Healing the Blindness of Score Matching
(
Poster
)
link »
Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of density estimation and report improved performance compared to traditional approaches. |
Mingtian Zhang · Oscar Key · Peter Hayes · David Barber · Brooks Paige · Francois-Xavier Briol 🔗 |