Skip to yearly menu bar Skip to main content


Geometry Based Data Generation

Ofir Lindenbaum · Jay Stanley · Guy Wolf · Smita Krishnaswamy

Room 517 AB #121

Keywords: [ Clustering ] [ Kernel Methods ] [ Nonlinear Dimensionality Reduction and Manifold Learning ] [ Generative Models ] [ Classification ] [ Computational Biology and Bioinformatics ] [ Regression ] [ Missing Data ]


We propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. This is in contrast to existing generative models that represent data density, and are strongly affected by noise and other artifacts of data collection. We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classification. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis.

Live content is unavailable. Log in and register to view live content