Skip to yearly menu bar Skip to main content


( events)   Timezone:  
Spotlight
Tue Dec 04 01:15 PM -- 01:20 PM (PST) @ Room 220 CD
Geometry Based Data Generation
Ofir Lindenbaum · Jay Stanley · Guy Wolf · Smita Krishnaswamy

We propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. This is in contrast to existing generative models that represent data density, and are strongly affected by noise and other artifacts of data collection. We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classification. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis.