The score function, which is the gradient of the logdensity, provides a unique way to represent probability distributions. By working with distributions through score functions, researchers have been able to develop efficient tools for machine learning and statistics, collectively known as scorebased methods.
Scorebased methods have had a significant impact on vastly disjointed subfields of machine learning and statistics, such as generative modeling, Bayesian inference, hypothesis testing, control variates and Stein’s methods. For example, scorebased generative models, or denoising diffusion models, have emerged as the stateoftheart technique for generating high quality and diverse images. In addition, recent developments in Stein’s method and scorebased approaches for stochastic differential equations (SDEs) have contributed to the developement of fast and robust Bayesian posterior inference in high dimensions. These have potential applications in engineering fields, where they could help improve simulation models.
At our workshop, we will bring together researchers from these various subfields to discuss the success of scorebased methods, and identify common challenges across different research areas. We will also explore the potential for applying scorebased methods to even more realworld applications, including in computer vision, signal processing, and computational chemistry. By doing so, we hope to folster collaboration among researchers and build a more cohesive research community focused on scorebased methods.
Fri 6:50 a.m.  7:00 a.m.

Introduction and opening remarks
(
Introduction
)
SlidesLive Video » 
🔗 
Fri 7:00 a.m.  7:30 a.m.

Invited talk: Karsten Kreis
(
Talk
)
SlidesLive Video » 
Karsten Kreis 🔗 
Fri 7:30 a.m.  8:00 a.m.

Invited Talk: Tommi Jaakkola
(
Talk
)
SlidesLive Video » 
Tommi Jaakkola 🔗 
Fri 8:00 a.m.  9:00 a.m.

Poster session 1
(
Poster
)

Yingzhen Li 🔗 
Fri 9:00 a.m.  10:00 a.m.

Panel discussion
(
Discussion Panel
)
SlidesLive Video » 
🔗 
Fri 10:00 a.m.  10:30 a.m.

Contributed talk session 1
(
talk
)
SlidesLive Video » 
🔗 
Fri 10:30 a.m.  11:30 a.m.

lunch break

🔗 
Fri 11:30 a.m.  12:00 p.m.

Invited Talk: GuanHorng Liu
(
Talk
)
SlidesLive Video » 
GuanHorng Liu 🔗 
Fri 12:00 p.m.  12:30 p.m.

Invited Talk: Tamara Fernandez
(
Talk
)
SlidesLive Video » 
Tamara Fernandez 🔗 
Fri 12:30 p.m.  1:00 p.m.

Contributed talk session 2
(
talk
)
SlidesLive Video » 
🔗 
Fri 1:00 p.m.  2:00 p.m.

Poster session 2
(
poster
)

🔗 
Fri 2:00 p.m.  2:30 p.m.

Invited Talk: Chenlin Meng
(
Talk
)
SlidesLive Video » 
Chenlin Meng 🔗 
Fri 2:30 p.m.  3:00 p.m.

Invited Talk: Mohammad Norouzi
(
Talk
)
SlidesLive Video » 
Mohammad Norouzi 🔗 


Modeling Temporal Data as Continuous Functions with Process Diffusion
(
Poster
)
link »
Temporal data like time series are often observed at irregular intervals which is a challenging setting for the existing machine learning methods. To tackle this problem, we view such data as samples from some underlying continuous function. We then define a diffusionbased generative model that adds noise from a predefined stochastic process while preserving the continuity of the resulting underlying function.A neural network is trained to reverse this process which allows us to sample new realizations from the learned distribution. We define suitable stochastic processes as noise sources and introduce novel denoising and scorematching models on processes. Further, we show how to apply this approach to the multivariate probabilistic forecasting and imputation tasks. Through our extensive experiments, we demonstrate that our method outperforms previous models on synthetic and realworld datasets. 
Marin Biloš · Kashif Rasul · Anderson Schneider · Yuriy Nevmyvaka · Stephan Günnemann 🔗 


SelfGuided Diffusion Model
(
Poster
)
link »
Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, such guidance requires a large amount of imageannotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we aim to eliminate the need for such annotation by instead leveraging the flexibility of selfsupervision signals to design a framework for selfguided diffusion models. By leveraging a feature extraction function and a selfannotation function, our method provides flexible guidance signals at various image granularities: from the level of holistic images to object boxes and even segmentation masks. 
TAO HU · David Zhang · Yuki Asano · Gertjan Burghouts · Cees Snoek 🔗 


Locking and Quacking: Stacking Bayesian models predictions by logpooling and superposition
(
Poster
)
link »
Combining predictive distributions is a central problem in Bayesian inference and machine learning. Currently, predictives are almost exclusively combined using linear densitymixtures such as Bayesian model averaging, Bayesian stacking, and mixture of experts.Nonetheless, linear mixtures impose traits that might be undesirable for some applications, such as multimodality.While there are alternative strategies (e.g., geometric bridge or superposition), optimizing their parameters usually implies computing intractable normalizing constant repeatedly.In this extended abstract, we present two novel Bayesian model combination tools. They are generalizations of \emph{stacking}, but combine posterior densities by loglinear pooling (\emph{locking}) and quantum superposition (\emph{quacking}). To optimize model weights while avoiding the burden of normalizing constants, we maximize the Hyv\"arinen score of the combined posterior predictions. We demonstrate locking and quacking with an illustrative example. 
Yuling Yao · Luiz Carvalho · Diego Mesquita 🔗 


Action Matching: A Variational Method for Learning Stochastic Dynamics from Samples
(
Poster
)
link »
Stochastic dynamics are ubiquitous in many fields of science, from the evolution of quantum systems in physics to diffusionbased models in machine learning. Existing methods such as score matching can be used to simulate these physical processes by assuming that the dynamics is a diffusion, which is not always the case. In this work, we propose a method called "Action Matching" that enables us to learn a much broader family of stochastic dynamics. Our method requires access only to samples from different timesteps, makes no explicit assumptions about the underlying dynamics, and can be applied even when samples are uncorrelated (i.e., are not part of a trajectory). Action Matching directly learns the underlying mechanism that moves samples in time without modeling the distributions at each timestep. In this work, we showcase how Action Matching can be used for generative modeling for computer vision tasks and discuss potential applications in other areas of science. 
Kirill Neklyudov · Daniel Severo · Alireza Makhzani 🔗 


Likelihood Score under Generalized SelfConcordance
(
Poster
)
link »
We show how, under a generalized selfconcordance assumption and possible model misspecification, we can establish nonasymptotic bounds on the normalized likelihood score when using maximum likelihood or score matching. The tail behavior is governed by an effective dimension corresponding to the trace of the sandwich covariance. We also show how our nonasymptotic approach allows us to obtain confidence set for the estimator and analyze Rao's score test. 
Lang Liu · Zaid Harchaoui 🔗 


Noiseconditional Maximum Likelihood Estimation with Scorebased Sampling
(
Poster
)
link »
We introduce a simple yet effective modification to the standard maximum likelihood estimation (MLE) framework for autoregressive generative models. Rather than maximizing a single unconditional likelihood of the data under the model, we maximize a family of \textit{noiseconditional} likelihoods consisting of the data perturbed by a continuum of noise levels. We find that models trained this way are more robust to noise, obtain higher test likelihoods, and generate higher quality images. They can also be sampled from via a novel scorebased sampling scheme which combats the classical \textit{covariate shift} problem that occurs during sample generation in autoregressive models. Applying this augmentation to autoregressive image models, we obtain 3.32 bits per dimension on the ImageNet 64x64 dataset, and substantially improve the quality of generated samples in terms of the Frechet Inception distance (FID)  from 37.50 to 13.50 on the CIFAR10 dataset. 
Henry Li · Yuval Kluger 🔗 


Journey to the BAOABlimit: finding effective MCMC samplers for scorebased models
(
Poster
)
link »
Diffusion and scorebased generative models have achieved remarkable sample quality on difficult image synthesis tasks. Many works have proposed samplers for pretrained diffusion models, including ancestral samplers, SDE and ODE integrators and annealed MCMC approaches. So far, the best sample quality has been achieved with samplers that use timeconditional score functions and move between several noise levels. However, estimating an accurate score function at many noise levels can be challenging and requires an architecture that is more expressive than would be needed for a single noise level. In this work, we explore MCMC sampling algorithms that operate at a single noise level, yet synthesize images with acceptable sample quality on the CIFAR10 dataset. We show that while naive application of Langevin dynamics and a related noisedenoise sampler produces poor samples, methods built on integrators of underdamped Langevin dynamics using splitting methods can perform well. Further, by combining MCMC methods with existing multiscale samplers, we begin to approach competitive sample quality without using scores at large noise levels. 
Ajay Jain · Ben Poole 🔗 


First hitting diffusion models
(
Poster
)
link »
We propose a family of First Hitting Diffusion Models (FHDM), deep generative models that generate data with a diffusion process that terminates at a random first hitting time. This yields an extension of the standard fixedtime diffusion models that terminate at a prespecified deterministic time. Although standard diffusion models are designed for continuous unconstrained data, FHDM is naturally designed to learn distributions on continuous as well as a range of discrete and structure domains. 
Mao Ye · Lemeng Wu · Qiang Liu 🔗 


Scorebased generative model learnmanifoldlike structures with constrained mixing
(
Poster
)
link »
How do scorebased generative models (SBMs) learn the data distribution supported on a lowerdimensional manifold? We investigate the score model of a trained SBM through its linear approximations and subspaces spanned by local feature vectors. During diffusion as the noise decreases, the local dimensionality increases and become more varied between different sample sequences. Importantly, we find that the learned vector field mixes images by a nonconservative field within the manifold, although it denoises with normal projections as if there is a potential function in offmanifold directions. At each noise level, the subspace spanned by the local features overlap with an effective density function. These observations suggest that SBMs can flexibly mix samples with the learned score field while carefully maintaining a manifoldlike structure of the data distribution. 
Li Kevin Wenliang · Ben Moran 🔗 


Exploring the Design Space of Generative Diffusion Processes for Sparse Graphs
(
Poster
)
link »
We extend scorebased generative diffusion processes (GDPs) to sparse graphs and other inherently discrete data, with a focus on scalability. GDPs apply diffusion to training samples, then learn a reverse process generating new samples out of noise. Previous work applying GDPs to discrete data effectively relax discrete variables to continuous ones.Our approach is different: we consider jump diffusion (i.e., diffusion with punctual discontinuities) in $\mathbb{R}^d \times \mathcal{G}$ where $\mathcal{G}$ models discrete components of the data. We focus our attention on sparse graphs: our \textsc{Dissolve} process gradually breaks apart a graph $(V,E) \in \mathcal{G}$ in a certain number of distinct jump events. This confers significant advantages compared to GDPs that use less efficient representations and/or that destroy the graph information in a sudden manner. Gaussian kernels allow for efficient training with denoising score matching; standard GDP methods can be adapted with just an extra argument to the score function. We consider improvement opportunities for \textsc{Dissolve} and discuss necessary conditions to generalize to other kinds of inherently discrete data.

PierreAndré Noël · Pau Rodriguez 🔗 


Posterior Coreset Construction with Kernelized Stein Discrepancy for ModelBased Reinforcement Learning
(
Poster
)
link »
Modelbased reinforcement learning (MBRL) exhibits favorable performance in practice, but its theoretical guarantees are mostly restricted to the setting when the transition model is Gaussian or Lipschitz and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to largescale training by incorporating a compression step such that the posterior estimate consists of a \emph{Bayesian coreset} of only statistically significant past stateaction pairs; and (iii) {exhibits a Bayesian regret of $\mathcal{O}(dH^{1+({\alpha}/{2})}T^{1({\alpha}/{2})})$ with coreset size of $\Omega(\sqrt{T^{1+\alpha}})$, where $d$ is the aggregate dimension of state action space, $H$ is the episode length, $T$ is the total number of time steps experienced, and $\alpha\in (0,1]$ is the tuning parameter which is a novel introduction into the analysis of MBRL in this work}. To achieve these results, we adopt an approach based upon Stein's method, which allows distributional distance to be evaluated in closed form as the kernelized Stein discrepancy (KSD). Experimentally, we observe that this approach is competitive with several stateoftheart RL methodologies, and can achieve up to $50\%$ reduction in wall clock time in some continuous control environments.

Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Pratap Tokekar · Dinesh Manocha 🔗 


Unsupervised Controllable Generation with Scorebased Diffusion Models: Disentangled Latent Code Guidance
(
Poster
)
link »
From the impressive empirical success of Scorebased diffusion models, it is recently spotlighted in generative models. In realworld applications, the controllable generation enriches the impact of diffusion models. This paper aims to solve the challenge by presenting the method of control in an unsupervised manner. We propose the Latent Code Guidance Diffusion Model (LCGDM), which is the first approach to apply disentanglement on Scorebased diffusion models. Disentangled latent code can be considered as a pseudolabel, since it separately expresses semantic information in each dimension. LCGDM is a Scorebased diffusion model that reflects disentangled latent code as the condition. LCGDM shows the best performance among baselines in terms of both sample quality and disentanglement on dSprites dataset. LCGDM can manipulate images on CelebA dataset, with comparable FID performance compared to nondisentangling Scorebased diffusion models. Furthermore, we provide experimental results of scaling method that reflects more on pseudolabel with MNIST dataset. 
Yeongmin Kim · Dongjun Kim · Hyeonmin Lee · Ilchul Moon 🔗 


Molecular Docking with Diffusion Generative Models
(
Oral
)
link »
Predicting the binding structure of a small molecule to a proteina task known as molecular dockingis critical to drug design. Recent deep learning methods that frame docking as a regression problem have yet to offer substantial improvements over traditional searchbased methods. We identify the drawbacks of a regressionbased approach and instead view molecular docking as a generative modeling problem. We develop DockDiff, a novel diffusion process and generative model over the main degrees of freedom involved during docking. Empirically, DockDiff obtains a 37% top1 success rate (RMSD <2A) on PDBBind, significantly outperforming the previous stateoftheart of traditional docking (23%) and deep learning (20%) methods. 
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola 🔗 


Scorebased Denoising Diffusion with NonIsotropic Gaussian Noise Models
(
Poster
)
link »
Generative models based on denoising diffusion techniques have led to an unprecedented increase in the quality and diversity of imagery that is now possible to create with neural generative models. However, most contemporary stateoftheart methods are derived from a standard isotropic Gaussian formulation. In this work we examine the situation where nonisotropic Gaussian distributions are used. We present the key mathematical derivations for creating denoising diffusion models using an underlying nonisotropic Gaussian noise model. We also provide initial experiments to help verify empirically that this more general modelling approach can also yield highquality samples. 
Vikram Voleti · Chris Pal · Adam Oberman 🔗 


Statistical Efficiency of Score Matching: The View from Isoperimetry
(
Oral
)
link »
Deep generative models parametrized up to a normalizing constant (e.g. energybased models) are difficult to train by maximizing the likelihood of the data because the likelihood and/or gradients thereof cannot be explicitly or efficiently written down. Score matching is a training method, whereby instead of fitting the likelihood $\log p(x)$ for the training data, we instead fit the score function $\nabla_x \log p(x)$  obviating the need to evaluate the partition function. Though this estimator is known to be consistent, its unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood  which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper, and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated  i.e. the Poincar\'e, logSobolev and isoperimetric constant  quantities which govern the mixing time of Markov processes like Langevin dynamics. Roughly, we show thatthe score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant. Conversely, if the distribution has a large isoperimetric constant  even for simple families of distributions like exponential families with rich enough sufficient statistics  score matching will be substantially less efficient than maximum likelihood. We suitably formalize these results both in the finite sample regime, and in the asymptotic regime. Finally, we identify a direct parallel in the discrete setting, where we connect the statistical properties of pseudolikelihood estimation with approximate tensorization of entropy and the Glauber dynamics.

Frederic Koehler · Alexander Heckett · Andrej Risteski 🔗 


ScoreBased Generative Models with Lévy Processes
(
Poster
)
link »
Time reversibility of stochastic processes is a primary cornerstone of the scorebased generative models through stochastic differential equations (SDEs). While a broader class of Markov processes is reversible, previous continuoustime approaches restrict the range of noise processes to Brownian motion (BM) since the closedform of the time reversal formula is only known for diffusion processes. In this paper, to expand the class of noise distribution, we propose a class of scorebased probabilistic generative models, LévyItō Model (LIM), which utilizes $\alpha$stable distribution for noise injection. To this end, we derive an approximate time reversal formula for the SDEs with Lévy processes that can allow discontinuous pure jump motion.Consequently, we advance the scorebased generative models with a broad range of nonGaussian Markov processes.Empirical results on MNIST, CIFAR10, CelebA, and CelebAHQ show that our approach is valid.

Eunbi Yoon · Keehun Park · Jinhyeok Kim · Sungbin Lim 🔗 


Fast Sampling of Diffusion Models via Operator Learning
(
Poster
)
link »
Diffusion models have found widespread adoption in various areas. However, sampling from them is still slow because it involves emulating a reverse stochastic process with hundredstothousands of neural network evaluations. Inspired by the recent success of neural operators in accelerating differential equations solving, we approach this problem by solving the underlying neural differential equation from an operator learning perspective. We examine probability flow ODE trajectories in diffusion model and observe a compact energy spectrum that can be learned efficiently in Fourier space. With this insight, we propose diffusion Fourier neural operator (DFNO) with temporal convolution in Fourier space to parameterize the operator that maps initial condition to the solution trajectory. DFNO can apply to any diffusion models and generate highquality samples in one step. Our method achieves the stateoftheart clean FID of 5.9 (legacy FID 4.72) on CIFAR10 using one network evaluation. 
Hongkai Zheng · Weili Nie · Arash Vahdat · Kamyar Azizzadenesheli · Anima Anandkumar 🔗 


Score Modeling for Simulationbased Inference
(
Poster
)
link »
Neural Posterior Estimation methods for simulationbased inference can be illsuited for dealing with posterior distributions obtained by conditioning on multiple observations, as they may require a large number of simulator calls to yield accurate approximations. Neural Likelihood Estimation methods can naturally handle multiple observations, but require a separate inference step, which may affect their efficiency and performance. We introduce a new method for simulationbased inference that enjoys the benefits of both approaches. We propose to model the scores for the posterior distributions induced by individual observations, and introduce a sampling algorithm that combines the learned scores to approximately sample from the target efficiently. 
Tomas Geffner · George Papamakarios · Andriy Mnih 🔗 


Improving Conditional ScoreBased Generation with Calibrated Classification and Joint Training
(
Poster
)
link »
Scorebased Generative Model (SGM) is a popular family of deep generative models that can achieve leading image generation quality. Earlier works have extended SGMs to tackle classconditional generation with the guidance of welltrained classifiers. Nevertheless, we find that the classifierguided SGMs actually do not achieve accurate conditional generation when evaluated with classconditional measures. We argue that the lack of control roots from inaccurate gradients within the classifiers. We then propose to improve classifierguided SGMs by calibrating classifiers using principles from energybased models. In addition, we design a jointtraining architecture to further enhance the conditional generation performance. Empirical results on CIFAR10 demonstrate that the proposed model improves the conditional generation accuracy significantly while maintaining similar generation quality. The results support the potential of memoryefficient SGMs for conditional generation based on classifier guidance. 
Paul K. Huang · SiAn Chen · HsuanTien Lin 🔗 


Dimension reduction via score ratio matching
(
Poster
)
link »
We propose a method to detect a lowdimensional subspace where a nonGaussian target distribution departs from a known reference distribution (e.g., a standard Gaussian). We identify this subspace from gradients of the logratio between the target and reference densities, which we call the score ratio. Given only samples from the target distribution, we estimate these gradients via score ratio matching, with a tailored parameterization and a regularization method that expose the lowdimensional structure we seek. We show that our approach outperforms standard score matching for dimension reduction of inclass distributions, and that several benchmark UCI datasets in fact exhibit this type of low dimensionality. 
Michael Brennan · Ricardo Baptista · Youssef Marzouk 🔗 


Spectral Diffusion Processes
(
Poster
)
link »
Scorebased generative modelling (SGM) has proven to be a very effective method for modelling densities on finitedimensional spaces. In this work we propose to extend this methodology to learn generative models over functional spaces. To do so, we represent functional data in spectral space to dissociate the stochastic part of the processes from their spacetime part. Using dimensionality reduction techniques we then sample from their stochastic component using finite dimensional SGM. We demonstrate our method’s effectiveness for modelling various multimodal datasets. 
Angus Phillips · Thomas Seror · Michael Hutchinson · Valentin De Bortoli · Arnaud Doucet · Emile Mathieu 🔗 


Convergence of scorebased generative modeling for general data distributions
(
Poster
)
link »
We give polynomial convergence guarantees for denoising diffusion models that do not rely on the data distribution satisfying functional inequalities or strong smoothness assumptions. Assuming a $L^2$accurate score estimate, we obtain Wasserstein distance guarantees for any distributions of bounded support or sufficiently decaying tails, as well as TV guarantees for distributions with further smoothness assumptions.

Holden Lee · Jianfeng Lu · Yixin Tan 🔗 


A generic diffusionbased approach for 3D human pose prediction in the wild
(
Poster
)
link »
3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging spatiotemporal task. It can be more challenging in realworld applications where occlusions will inevitably happen, and estimated 3D coordinates of joints would contain some noise.We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise, and propose a conditional diffusion model that denoises them and forecasts plausible poses. Instead of naively predicting all future frames at once, our model consists of two cascaded submodels, each specialized for modeling short and long horizon distributions.We also propose a repairing step to improve the performance of any 3D pose forecasting model in the wild, by leveraging our diffusion model to repair the inputs. We investigate our findings on several datasets, and obtain significant improvements over the state of the art. The code will be made available online. 
Saeed Saadatnejad · Ali Rasekh · Mohammadreza Mofayezi · Yasamin Medghalchi · Sara Rajabzadeh · Taylor Mordan · Alexandre Alahi 🔗 


Diffusion Models for Video Prediction and Infilling
(
Poster
)
link »
Video prediction and infilling require strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain.We present RandomMask Video Diffusion (RaMViD), which extends image diffusion models to videos using 3D convolutions, and introduces a new conditioning technique during training.By varying the mask we condition on, the model is able to perform video prediction, infilling, and upsampling. Due to our simple conditioning scheme, we can utilize the same architecture as used for unconditional training, which allows us to train the model in a conditional and unconditional fashion at the same time. We evaluate the model on two benchmark datasets for video prediction, on which we achieve stateoftheart results, and one for video generation. 
Tobias Höppe · Arash Mehrjou · Stefan Bauer · Didrik Nielsen · Andrea Dittadi 🔗 


Discovering the Hidden Vocabulary of DALLE2
(
Poster
)
link »
We discover that DALLE2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that 
Giannis Daras · Alex Dimakis 🔗 


Multiresolution Textual Inversion
(
Oral
)
link »
We extend Textual Inversion to learn pseudowords that represent a concept at different resolutions. This allows us to generate images that use the concept at different resolutions and also to manipulate different resolutions using language.Once learned, the user can generate images that agree with the original concept at different levels of detail; ``A photo of $S^*(0)$'' produces the exact object while the prompt ``A photo of $S^*(0.8)$'' only matches the rough outlines and colors. Our framework allows us to generate images that use different resolutions of an image (e.g. details, textures, styles) as separate pseudowords that can be composed in various ways.

Giannis Daras · Alex Dimakis 🔗 


Neural Volumetric Mesh Generator
(
Poster
)
link »
Deep generative models have shown success in generating 3D shapes with different representations. In this work, we propose Neural Volumetric Mesh Generator (NVMG), which can generate novel and highquality volumetric meshes. Unlike the previous 3D generative model for point cloud, voxel, and implicit surface, volumetric mesh is a readytouse representation in industry with details on both the surface and interior. Generating this kind of highlystructured data thus brings a great challenge. To tackle this problem, we first propose to use a diffusionbased generative model to generate voxelized shapes with realistic shape and topology information. With the voxelized shape, we can simply obtain a tetrahedral mesh as a template. Further, we use a voxelconditional neural network to predict the surface conditioned on the voxels, and progressively project the tetrahedral mesh to the predicted surface under regularization. As shown in the experiments, without any postprocessing, our pipeline can generate highquality artifactfree volumetric and surface meshes. 
Yan Zheng · Lemeng Wu · Xingchao Liu · Zhen Chen · Qiang Liu · Qixing Huang 🔗 


Targeted Separation and Convergence with Kernel Discrepancies
(
Oral
)
link »
Kernel Stein discrepancies (KSDs) are maximum mean discrepancies (MMDs) that leverage the score information of distributions, andhave grown central to a wide range of applications. In most settings, these MMDs are required to $(i)$ separate a target $\mathrm{P}$ from other probability measures or even $(ii)$ control weak convergence to $\mathrm{P}$. In this article we derive new sufficient and necessary conditions that substantially broaden the known conditions for KSD separation and convergence control, and develop the first KSDs known to metrize weak convergence to $\mathrm{P}$. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent.

Alessandro Barp · CarlJohann SimonGabriel · Mark Girolami · Lester Mackey 🔗 


Using Perturbation to Improve GoodnessofFit Tests based on Kernelized Stein Discrepancy
(
Poster
)
link »
Kernelized Stein discrepancy (KSD) is a scorebased discrepancy widely employed in goodnessoffit tests. It can be used even when the target distribution has an unknown normalising factor, such as in Bayesian analysis. We show theoretically and empirically that the power of the KSD test can be low when the target distribution has wellseparated modes, which is due to insufficient data in regions where the score functions of the alternative and the target distributions differ the most. To improve its test power, we propose to perturb the target and alternative distributions before applying the KSD test. The perturbation uses a Markov transition kernel that leaves the target invariant but perturbs alternatives. We provide numerical evidence that the proposed approach can lead to a substantially higher power than the KSD test when the target and the alternative are mixture distributions that differ only in mixing weights. 
Xing Liu · Andrew Duncan · Axel Gandy 🔗 


Regularizing Scorebased Models with Score FokkerPlanck Equations
(
Poster
)
link »
Scorebased generative models learn a family of noiseconditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These pertubed data densities are tied together by the FokkerPlanck equation (FPE), a PDE governing the spatialtemporal evolution of a density undergoing a diffusion process. In this work, we derive a corresponding equation characterizing the noiseconditional scores of the perturbed data densities (i.e., their gradients), termed the score FPE. Surprisingly, despite impressive empirical performance, we observe that scores learned via denoising score matching (DSM) do not satisfy the underlying score FPE. We mathematically analyze two implications of satisfying the score FPE and a potential explanation for why the score FPE is not satisfied in practice. At last, we propose to regularize the DSM objective to enforce satisfaction of the score FPE, and show its effectiveness on synthetic data and MNIST. 
ChiehHsin Lai · Yuhta Takida · Naoki Murata · Toshimitsu Uesaka · Yuki Mitsufuji · Stefano Ermon 🔗 


Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
(
Poster
)
link »
We provide theoretical convergence guarantees for scorebased generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of largescale realworld generative models such as DALLE 2. Our main result is that, assuming accurate score estimates, such SGMs can efficiently sample from essentially any realistic data distribution. In contrast to prior works, our results (1) hold for an $L^2$accurate score estimate (rather than $L^\infty$accurate); (2) do not require restrictive functional inequality conditions that preclude substantial nonlogconcavity; (3) scale polynomially in all relevant problem parameters; and (4) match stateoftheart complexity guarantees for discretization of the Langevin diffusion, provided that the score error is sufficiently small. We view this as strong theoretical justification for the empirical success of SGMs. We also examine SGMs based on the critically damped Langevin diffusion (CLD). Contrary to conventional wisdom, we provide evidence that the use of the CLD does *not* reduce the complexity of SGMs.

Sitan Chen · Sinho Chewi · Jerry Li · Yuanzhi Li · Adil Salim · Anru Zhang 🔗 


Fast Sampling of Diffusion Models with Exponential Integrator
(
Poster
)
link »
Our goal is to develop a fast sampling method for Diffusion models~(DMs) with a small number of steps while retaining high sample quality. To achieve this, we systematically analyze the sampling procedure in DMs and identify key factors that affect the sample quality, among which the method of discretization is most crucial. By carefully examining the learned diffusion process, we propose Diffusion Exponential Integrator Sampler~(DEIS). It is based on the Exponential Integrator designed for discretizing ordinary differential equations (ODEs) and leverages a semilinear structure of the learned diffusion process to reduce the discretization error. The proposed method can be applied to any DMs and can generate highfidelity samples in as few as 10 steps.By directly using pretrained DMs, we achieve superior sampling performance when the number of score function evaluation~(NFE) is limited, e.g., 4.17 FID with 10 NFEs, 2.86 FID with only 20 NFEs on CIFAR10. 
Qinsheng Zhang · Yongxin Chen 🔗 


On Distillation of Guided Diffusion Models
(
Oral
)
link »
Classifierfree guided diffusion models have recently been shown to be highly effective at highresolution image generation, and they have been widely used in largescale diffusion frameworks including DALL$\cdot$E 2, GLIDE and Imagen. However, a downside of classifierfree guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a classconditional model and an unconditional model, hundreds of times. To deal with this limitation, we propose an approach to distilling classifierfree guided diffusion models into models that are fast to sample from: Given a pretrained classifierfree guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. On ImageNet 64x64 and CIFAR10, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from.

Chenlin Meng · Ruiqi Gao · Diederik Kingma · Stefano Ermon · Jonathan Ho · Tim Salimans 🔗 


An optimal control perspective on diffusionbased generative modeling
(
Oral
)
link »
We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs) such as recently developed diffusion probabilistic models. In particular, we derive a HamiltonJacobiBellman equation that governs the evolution of the logdensities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the wellknown verification theorem from control theory. Further, we develop a novel diffusionbased method for sampling from unnormalized densities  a problem frequently occurring in statistics and computational sciences. 
Julius Berner · Lorenz Richter · Karen Ullrich 🔗 


Conditioned ScoreBased Models for Learning CollisionFree Trajectory Generation
(
Poster
)
link »
Planning a motion in a cluttered environment is a recurring task autonomous agents need to solve. This paper presents a first attempt to learn generative models for collisionfree trajectory generation based on conditioned scorebased models. Given multiple navigation tasks, environment maps and collisionfree trajectories precomputed with a samplebased planner, using a signed distance function loss we learn a vision encoder of the map and use its embedding to learn a conditioned scorebased model for trajectory generation. A novelty of our method is to integrate in a temporal Unet architecture, using a crossattention mechanism, conditioning variables such as the latent representation of the environment and task features. We validate our approach in a simulated 2D planar navigation toy task, where a robot needs to plan a path that avoids obstacles in a scene. 
Joao Carvalho · Mark Baierl · Julen Urain · Jan Peters 🔗 


Convergence in KL and Rényi Divergence of the Unadjusted Langevin Algorithm Using Estimated Score
(
Poster
)
link »
We study the Unadjusted Langevin Algorithm (ULA) for sampling using an estimated score function when the target distribution satisfies logSobolev inequality (LSI), motivated by Scorebased Generative Modeling (SGM). We prove convergence in KullbackLeibler (KL) divergence under a minimal sufficient assumption on the error of score estimator called bounded Moment Generating Function (MGF) assumption. Our assumption is weaker than the previous assumption which requires finite $L^\infty$ norm of the error. Under the $L^\infty$ error assumption, we also prove convergence in R\'enyi divergence, which is stronger than KL divergence. On the other hand, under $L^p$ error assumption for any $1 \leq p < \infty$ which is weaker than bounded MGF assumption, we show that the stationary distribution of Langevin dynamics with a $L^p$accurate score estimator can be arbitrarily far away from the desired distribution. Thus having a $L^p$accurate score estimator cannot guarantee convergence. Our results suggest controlling mean squared error which is the form of commonly used loss function when using neural network to estimate score function is not enough to guarantee the upstream algorithm will converge, hence in order to get a theoretical guarantee we need a stronger control over the error in score matching. Despite requiring an exponentially decaying error probability, we give an example to demonstrate the bounded MGF assumption is achievable when using Kernel Density Estimation (KDE)based score estimator.

Kaylee Y. Yang · Andre Wibisono 🔗 


On RKHS Choices for Assessing Graph Generators via Kernel Stein Statistics
(
Poster
)
link »
Scorebased kernelised Stein discrepancy (KSD) tests have emerged as a powerful tool for the goodness of fit tests, especially in high dimensions; however, the test performance may depend on the choice of kernels in an underlying reproducing kernel Hilbert space (RKHS). Here we assess the effect of RKHS choice for KSD tests of random networks models, developed for exponential random graph models (ERGMs) in Xu and Reinert (2021) and for synthetic graph generators in Xu and Reinert (2022). We investigate the power performance and the computational runtime of the test in different scenarios, including both dense and sparse graph regimes. Experimental results on kernel performance for model assessment tasks are shown and discussed on synthetic and realworld network applications. 
Wenkai Xu · Gesine D Reinert · Moritz Weckbecker 🔗 


Proposal of a Score Based Approach to Sampling Using Monte Carlo Estimation of Score and Oracle Access to Target Density
(
Poster
)
link »
Score based approaches to sampling have shown much success as a generative algorithm to produce new samples from a target density given a pool of initial samples. In this work, we consider if we have no initial samples from the target density, but rather $0^{th}$ and $1^{st}$ order oracle access to the log likelihood. Such problems may arise in Bayesian posterior sampling, or in training a network from data. Using this knowledge alone, we propose a Monte Carlo method to estimate the score empirically as a particular expectation of a random variable. Using this estimator, we can then run a discrete version of the backward flow SDE to produce samples from the target density. This approach has the benefit of not relying on a pool of initial samples from the target density, and it does not rely on a neural network or other black box model to estimate the score.

Curtis McDonald · Andrew Barron 🔗 


Diffusion Prior for Online Decision Making: A Case Study of Thompson Sampling
(
Poster
)
link »
In this work, we investigate the possibility of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the metalearning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new task at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. Preliminary experiments clearly demonstrate the potential of the considered approach. 
YuGuan Hsieh · Shiva Kasiviswanathan · Branislav Kveton · Patrick Blöbaum 🔗 


Scalable Causal Discovery with Score Matching
(
Poster
)
link »
This paper demonstrates how to discover the whole causal graph from the second derivative of the loglikelihood in observational nonlinear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function $\nabla \operatorname{log}p(\mathbf{X})$, we extend the work of Rolland et al., 2022, that only recovers the topological order from the score and requires an expensive pruning step to discover the edges.Our analysis leads to DAS, a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current stateoftheart while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar.

Francesco Montagna · Nicoletta Noceti · Lorenzo Rosasco · Kun Zhang · Francesco Locatello 🔗 


All are Worth Words: a ViT Backbone for Scorebased Diffusion Models
(
Poster
)
link »
Vision transformers (ViT) have shown promise in various vision tasks including lowlevel ones while the UNet remains dominant in scorebased diffusion models. In this paper, we perform a systematical empirical study on the ViTbased architectures in diffusion models. Our results suggest that adding extra long skip connections (like the UNet) to ViT is crucial to diffusion models. The new ViT architecture, together with other improvements, is referred to as UViT. On several popular visual datasets, UViT achieves competitive generation results to SOTA UNet while requiring comparable amount of parameters and computation if not less. 
Fan Bao · Chongxuan LI · Yue Cao · Jun Zhu 🔗 


Finetuning Diffusion Models with Limited Data
(
Poster
)
link »
Diffusion models have recently shown remarkable progress, demonstrating stateoftheart image generation qualities. Like the other highfidelity generative models, diffusion models require a large amount of data and computing time for stable training, which hinders the application of diffusion models for limited data settings. To overcome this issue, one can employ a pretrained diffusion model built on a largescale dataset and finetune it on a target dataset. Unfortunately, as we show empirically, this easily results in overfitting. In this paper, we propose an efficient finetuning algorithm for diffusion models that can efficiently and robustly train on limited data settings. We first show that finetuning only the small subset of the pretrained parameters can efficiently learn the target dataset with much less overfitting. Then we further introduce a lightweight adapter module that can be attached to the pretrained model with minimal overhead and show that finetuning with our adapter module significantly improves the image generation quality. We demonstrate the effectiveness of our method on various realworld image datasets. 
Taehong Moon · Moonseok Choi · Gayoung Lee · JungWoo Ha · Juho Lee 🔗 


JPEG Artifact Correction using Denoising Diffusion Restoration Models
(
Poster
)
link »
Diffusion models can be used as learned priors for solving various inverse problems. However, most existing approaches are restricted to linear inverse problems, limiting their applicability to more general cases. In this paper, we build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some nonlinear inverse problems. We leverage the pseudoinverse operator used in DDRM and generalize this concept for other measurement operators, which allows us to use pretrained unconditional diffusion models for applications such as JPEG artifact correction. We empirically demonstrate the effectiveness of our approach across various quality factors, attaining performance levels that are on par with stateoftheart methods trained specifically for the JPEG restoration task. 
Bahjat Kawar · Jiaming Song · Stefano Ermon · Michael Elad 🔗 


Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation
(
Poster
)
link »
Learning the underlying distribution of molecular graphs and generating highfidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modelling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialised hybrid graph noise prediction model that extracts the global context and the local nodeedge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semilinear structure of the probability flow ODE. Experiments on diverse datasets validate the framework effectiveness. The proposed method, in particular, still generates highquality molecular graphs in a limited number of steps. 
Han Huang · Leilei Sun · Bowen Du · Weifeng Lv 🔗 


Denoising Diffusion for Sampling SAT Solutions
(
Poster
)
link »
Generating diverse solutions to the Boolean Satisfiability Problem (SAT) is a hard computational problem with practical applications for testing and functional verification of software and hardware designs. We explore the way to generate such solutions using Denoising Diffusion coupled with a Graph Neural Network to implement the denoising function. We find that the obtained accuracy is similar to the currently best purely neural method and the produced SAT solutions are highly diverse even if the system is trained with nonrandom solutions from a standard solver. 
Karlis Freivalds · Sergejs Kozlovičs 🔗 


When are equilibrium networks scoring algorithms?
(
Poster
)
link »
Principal Component Analysis (PCA) and its exponential family extensions have three components: observed variables, latent variables and parameters of a linear transformation. The likelihood of the observation is an exponential family with canonical parameters that are a linear transformation of the latent variables. We show how joint maximum aposteriori (MAP) estimates can be computed using a deep equilibrium model that computes roots of the score function. Our analysis provides a systematic way to relate neural network activation functions back to statistical assumptions about the observations. Our layers are implicitly differentiable, and can be finetuned in downstream tasks, as demonstrated on a synthetic task. 
Russell Tsuchida · Cheng Soon Ong 🔗 


Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
(
Poster
)
link »
We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions $\pi_0$ and $\pi_1$, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from $\pi_0$ and $\pi_1$ as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of $\pi_0$ and $\pi_1$ to a new deterministic coupling with provably nonincreasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation and imagetoimage translation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with \emph{a single Euler discretization step}.

Xingchao Liu · Chengyue Gong · Qiang Liu 🔗 


Let us Build Bridges: Understanding and Extending Diffusion Generative Models
(
Poster
)
link »
Diffusionbased generative models have achieved promising results recently, but raise an array of open questions in terms of conceptual understanding, theoretical analysis, algorithm improvement and extensions to discrete, structured, nonEuclidean domains. This work tries to reexam the overall framework, in order to gain better theoretical understandings and develop algorithmic extensions for data from arbitrary domains. By viewing diffusion models as latent variable models with unobserved diffusion trajectories and applying maximum likelihood estimation (MLE) with latent trajectories imputed from an auxiliary distribution, we show that both the model construction and the imputation of latent trajectories amount to constructing diffusion bridge processes that achieve deterministic values and constraints at end point, for which we provide a systematic study and a suit of tools. Leveraging our framework, we present a simple and unified approach to learning on data from different discrete and constrained domains. Experiments show that our methods perform superbly on generating images and semantic segments. 
Xingchao Liu · Lemeng Wu · Mao Ye · Qiang Liu 🔗 


Improved Marginal Unbiased Score Expansion (MUSE) via Implicit Differentiation
(
Poster
)
link »
We apply the technique of implicit differentiation to boost performance, reduce numerical error, and remove required usertuning in the Marginal Unbiased Score Expansion (MUSE) algorithm for hierarchical Bayesian inference. We demonstrate these improvements on three representative inference problems: 1) an extended Neal's funnel 2) Bayesian neural networks, and 3) probabilistic principal component analysis. On our particular test cases, MUSE with implicit differentiation is faster than Hamiltonian Monte Carlo by factors of 155, 397, and 5, respectively, or factors of 65, 278, and 1 without implicit differentiation, and yields good approximate marginal posteriors. The Julia and Python MUSE packages have been updated to use implicit differentiation, and can solve problems defined by hand or with any of a number of popular probabilistic programming languages and automatic differentiation backends. 
Marius Millea 🔗 


FewShot Diffusion Models
(
Poster
)
link »
Denoising diffusion probabilistic models (DDPM) are powerful hierarchical latent variable models with remarkable sample generation quality and training stability. These properties can be attributed to parameter sharing in the generative hierarchy, as well as a parameterfree diffusionbased inference procedure. In this paper, we present FewShot Diffusion Models (FSDM), a framework for fewshot generation leveraging conditional DDPMs. FSDMs are trained to adapt the generative process conditioned on a small set of images from a given class by aggregating image patch information using a setbased Vision Transformer (ViT). At test time, the model is able to generate samples from previously unseen classes conditioned on as few as 5 samples from that class. We empirically show that FSDM can perform fewshot generation and transfer to new datasets. We benchmark variants of our method on complex vision datasets for fewshot learning and compare to unconditional and conditional DDPM baselines. Additionally, we show how conditioning the model on patchbased input set information improves training convergence. 
Giorgio Giannone · Didrik Nielsen · Ole Winther 🔗 


Batch Denoising via BlahutArimoto
(
Poster
)
link »
In this work, we propose to solve batch denoising using BlahutArimoto algorithm (BA). Batch denoising via BA (BDBA), similar to Deep Image Prior (DIP), is based on an untrained scorebased generative model. Theoretical results show that ourdenoising estimation is highly likely to be close to the best result. Experimentally,we show that BDBA outperforms DIP significantly. 
Qing Li · Cyril Guyot 🔗 


Why Are Conditional Generative Models Better Than Unconditional Ones?
(
Poster
)
link »
Extensive empirical evidence demonstrates that conditional generative models are easier to train and perform better than unconditional ones by exploiting the labels of data. So do scorebased diffusion models. In this paper, we analyze the phenomenon formally and identify that the key of conditional learning is to partition the data properly. Inspired by the analyses, we propose selfconditioned diffusion models (SCDM), which is trained conditioned on indices clustered by the $k$means algorithm on the features extracted by a model pretrained in a selfsupervised manner. SCDM significantly improves the unconditional model across various datasets and achieves a recordbreaking FID of 3.94 on ImageNet 64x64 without labels. Besides, SCDM achieves a slightly better FID than the corresponding conditional model on CIFAR10.

Fan Bao · Chongxuan LI · Jiacheng Sun · Jun Zhu 🔗 


Particlebased Variational Inference with Preconditioned Functional Gradient Flow
(
Poster
)
link »
Particlebased variational inference (VI) minimizes the KL divergence between model samples and the target posterior with gradient flow estimates. With the popularity of Stein variational gradient descent (SVGD), the focus of particlebased VI algorithms have been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow. However, the requirement of RKHS restricts the function class and algorithmic flexibility. This paper remedies the problem by proposing a general framework to obtain tractable functional gradient flow estimates. The functional gradient flow in our framework can be defined by a general functional regularization term that includes the RKHS norm as a special case. We also use our framework to propose a new particlebased VI algorithm: \emph{preconditioned functional gradient flow} (PFG). Compared with SVGD, the proposed preconditioned functional gradient method has several advantages: larger function classes; greater scalability in the large particlesize scenarios; better adaptation to illconditioned target distribution; provable continuoustime convergence in KL divergence. Both theoretical and experiments have shown the effectiveness of our framework. 
Hanze Dong · Xi Wang · Yong Lin · Tong Zhang 🔗 


Making TexttoImage Diffusion Models ZeroShot ImagetoImage Editors by Inferring "Random Seeds"
(
Poster
)
link »
Recent texttoimage diffusion models trained on largescale data achieve remarkable performance on textconditioned image synthesis (e.g., GLIDE, DALL∙E 2, Imagen, Stable Diffusion). This paper presents an embarrassingly simple method to use these texttoimage diffusion models as zeroshot imagetoimage editors. Our method, CycleDiffusion, is based on a recent finding that, when the "random seed" is fixed, sampling from two diffusion model distributions will produce images with minimal differences, and the core of our idea is to infer the "random seed" that is likely to produce a source image conditioned on a source text. We formalize the "random seed" as a sequence of isometric Gaussian noises that we reformulate as diffusion models' latent code. Using the "random seed" inferred from the source textimage pair, we generate a target image conditioned a target text. Experiments show that CycleDiffusion can minimally edit the image in a zeroshot manner. 
Chen Henry Wu · Fernando D De la Torre 🔗 


Progressive Deblurring of Diffusion Models for CoarsetoFine Image Synthesis
(
Poster
)
link »
Recently, diffusion models have shown remarkable results in image synthesis by gradually removing noise and amplifying signals. Although the simple generative process surprisingly works well, is this the best way to generate image data? For instance, despite the fact that human perception is more sensitive to the lowfrequencies of an image, diffusion models themselves do not consider any relative importance of each frequency component. Therefore, to incorporate the inductive bias for image data, we propose a novel generative process that synthesizes images in a coarsetofine manner. First, we generalize the standard diffusion models by enabling diffusion in a rotated coordinate system with different velocities for each component of the vector. We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds. Specifically, the proposed blur diffusion consists of a forward process that blurs an image and adds noise gradually, after which a corresponding reverse process deblurs an image and removes noise progressively. Experiments show that proposed model outperforms the previous method in FID on LSUN bedroom and church datasets. 
Sangyun Lee · Hyungjin Chung · Jaehyeon Kim · Jong Chul Ye 🔗 


Towards Healing the Blindness of Score Matching
(
Poster
)
link »
Scorebased divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multimodal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of density estimation and report improved performance compared to traditional approaches. 
Mingtian Zhang · Oscar Key · Peter Hayes · David Barber · Brooks Paige · FrancoisXavier Briol 🔗 