Timezone: »
This two day workshop will bring together researchers from the various subdisciplines of Geometric Data Analysis, such as manifold learning, topological data analysis, shape analysis, will showcase recent progress in this field and will establish directions for future research. The focus will be on high dimensional and big data, and on mathematically founded methodology.
Specific aims
=============
One aim of this workshop is to build connections between Topological Data Analysis on one side and Manifold Learning on the other. This is starting to happen, after years of more or less separate evolution of the two fields. The moment has been reached when the mathematical, statistical and algorithmic foundations of both areas are mature enough  it is now time to lay the foundations for joint topological and differential geometric understanding of data, and this workshop will expliecitly focus on this process.
The second aim is to bring GDA closer to real applications. We see the challenge of real problems and real data as a motivator for researchers to explore new research questions, to reframe and expand the existing theory, and to step out of their own subarea. In particular, for people in GDA to see TDA and ML as one.
The impact of GDA in practice also depends on having scalable implementations of the most current results in theory. This workshop will showcase the GDA tools which achieve this and initiate a collective discussion about the tools that need to be built.
We intend this workshop to be a forum for researchers in all areas of Geometric Data Analysis. Trough the tutorials, we are reaching out to the wider NIPS audience, to the many potential users of of Geometric Data Analysis, to make them aware of the state of the art in GDA, and of the tools available. Last but not least, we hope that the scientists invited will bring these methods back to their communities.
Fri 8:10 a.m.  9:10 a.m.

Supervised learning of labeled pointcloud differences via covertree entropy reduction
(
Invited talk
)
We introduce a new algorithm, called CDER, for supervised machine learning that merges the multiscale geometric properties of Cover Trees with the informationtheoretic properties of entropy. CDER applies to a training set of labeled pointclouds embedded in a common Euclidean space. If typical pointclouds corresponding to distinct labels tend to differ at any scale in any subregion, CDER can identify these differences in linear time, creating a set of distributional coordinates which act as a feature extraction mechanism for supervised learning. We describe the use of CDER both directly on point clouds and on persistence diagrams. 
John L Harer 🔗 
Fri 9:10 a.m.  9:40 a.m.

Estimating the Reach of a Manifold
(
Talk
)
Various problems in manifold estimation make use of a quantity called the reach, denoted by τM, which is a measure of the regularity of the manifold. This paper is the first investigation into the problem of how to estimate the reach. First, we study the geometry of the reach through an approximation perspective. We derive new geometric results on the reach for submanifolds without boundary. An estimator ˆτ of τM is proposed in a framework where tangent spaces are known, and bounds assessing its efficiency are derived. In the case of i.i.d. random point cloud X n , τˆ ( X n) is showed to achieve uniform expected loss bounds over a C3like model. Finally, we obtain upper and lower bounds on the minimax rate for estimating the reach. 
Eddie Aamari 🔗 
Fri 9:40 a.m.  10:10 a.m.

Multiscale geometric feature extraction
(
Talk
)

Wolfgang Polonik 🔗 
Fri 10:10 a.m.  10:30 a.m.

Poster spotlights
(
Spotlights
)
Each poster presenter will have approximatively 1.5 minutes to advertise their posters. We encourage all poster presenters to put up their posters at the beginning of the workshop. 
🔗 
Fri 10:15 a.m.  6:00 p.m.

Parallel multiscale reduction of persistent homology
(
Poster
)
Persistent homology is a mathematical formalism based on algebraic topology and is central to Topological Data Analysis (TDA). Its paradigm consists in estimating the topological features of a shape embedded in an Euclidean space from a pointcloud sampled from it. The estimation is done at multiple scales by reducing a, socalled, boundary matrix which is a sparse binary matrix that encodes a simplicial complex filtration built from the pointcloud. The reduction process is similar to Gaussian elimination and represents an important computational bottleneck in the pipeline. To improve the scalability of the TDA framework, several strategies to accelerate it have been proposed. Herein, we present a number of structural dependencies in boundary matrices and use them to design a novel parallel reduction algorithm. In particular, we show that this structural information: (i) makes part of the reduction immediately apparent, (ii) decreases the total number of column operations required for reduction, (iii) gives a framework for which the process can be massively parallelised. Simulations on synthetic examples show that the computational burden can be conducted in a small fraction of the number of iterations needed by traditional methods. Moreover, whereas the traditional methods reveal barcodes sequentially from a filtration order, this approach gives an alternative method by which barcodes are partly revealed for multiple scales simultaneously and further refined as the algorithm progresses. Specifically, our numerical experiments show that for a VietorisRips filtration with 10^4 simplices, the essential topological information can be estimated with 95% precision in two iterations and that the reduction completed to within 1% in about ten iterations of our algorithm as opposed to nearly approximately eight thousand iterations for traditional methods. 
Rodrigo Mendoza Smith 🔗 
Fri 10:15 a.m.  6:00 p.m.

A dual framework for low rank tensor completion
(
Poster
)
We propose a novel formulation of the lowrank tensor completion problem that is based on the duality theory and a particular choice of lowrank regularizer. This lowrank regularizer along with the dual perspective provides a simple characterization of the solution to the tensor completion problem. Motivated by largescale setting, we next derive a rankconstrained reformulation of the proposed optimization problem, which is shown to lie on the Riemannian spectrahedron manifold. We exploit the versatile Riemannian optimization framework to develop computationally efficient conjugate gradient and trustregion algorithms. The experiments confirm the benefits of our choice of regularization and the proposed algorithms outperform stateoftheart algorithms on several realworld data sets in different applications. 
Madhav Nimishakavi 🔗 
Fri 10:15 a.m.  6:00 p.m.

Maximum likelihood estimation of Riemannian metrics from Euclidean data ( Poster ) link »  Georgios Arvanitidis 🔗 
Fri 10:30 a.m.  11:00 a.m.

Coffee break

🔗 
Fri 11:00 a.m.  11:30 a.m.

Persistent homology of KDE filtration of Rips complexes
(
Talk
)
When we observe a point cloud in the Euclidean space, the persistent homology of the upper level sets filtration of the density is one of the most important tools to understand topological features of the data generating distribution. The persistent homology of KDEs (kernel density estimators) for the density function is a natural way to estimate the target quantity. In practice, however, calculating the persistent homology of KDEs on ddimensional Euclidean spaces requires to approximate the ambient space to a grid, which could be computationally inefficient when the dimension of the ambient space is high or topological features are in different scales. In this abstract, we consider the persistent homologies of KDE filtrations on Rips complexes as alternative estimators. We show consistency results for both the persistent homology of the upper level sets filtration of the density and its simplified version. We also describe a novel methodology to construct an asymptotic confidence set based on the bootstrap procedure. Unlike existing procedures, our method does not heavily rely on gridapproximations, scales to higher dimensions, and is adaptive to heterogeneous topological features. 
Jaehyeok Shin · Alessandro Rinaldo 🔗 
Fri 11:30 a.m.  11:55 a.m.

Characterizing nonlinear dimensionality reduction methods using Laplacianlike operators
(
Talk
)
(note: the talk is 30 mins, but the server has problems with 12:00 noon) We examine a number of nonlinear dimensionality reduction techniques including Laplacian Eigenmaps, LLE, MVU, HLLE, LTSA, and tSNE. In each case we show that the nonlinear embedding can be characterized by a Laplacian or Laplacianlike operator. By comparing the resulting operators, one can uncover the similarities and differences between the methods. For example, HLLE and LTSA can be shown to be asymptotically identical, and whilst maximum variance unfolding (MVU) can be shown to generate a Laplacian, the behavior of the Laplacian is completely different from that generated by Laplacian Eigenmaps. We discuss the implications of this characterization for generating new nonlinear dimensionality reduction methods and smoothness penalties. 
Daniel Ting 🔗 
Fri 1:30 p.m.  2:00 p.m.

Poster session I
(
Poster
)

🔗 
Fri 2:00 p.m.  3:00 p.m.

Multiscale characterization of molecular dynamics
(
Invited talk
)

Cecilia Clementi 🔗 
Fri 3:30 p.m.  4:00 p.m.

Functional Data Analysis using a Topological Summary Statistic: the Smooth Euler Characteristic Transform,
(
Talk
)
Lorin Crawford1,2,3,†, Anthea Monod4,†, Andrew X. Chen4, Sayan Mukherjee5,6,7,8, and Rau ́l Rabad ́an4 
Lorin Crawford 🔗 
Fri 4:00 p.m.  4:30 p.m.

Consistent manifold representation for TDA
(
Talk
)

Timothy Sauer 🔗 
Fri 5:00 p.m.  6:00 p.m.

Discussion: Geometric Data Analysis
(
Discussion
)
One aim of this workshop is to build connections between Topological Data Analysis on one side and Manifold Learning on the other. The moment has been reached when the mathematical, statistical and algorithmic foundations of both areas are mature enough  it is now time to lay the foundations for joint topological and differential geometric understanding of data, and this discussion will explicitly focus on this process. The second aim is to bring GDA closer to real applications. We see the challenge of real problems and real data as a motivator for researchers to explore new research questions, to reframe and expand the existing theory, and to step out of their own subarea. 
Frederic Chazal · Marina Meila 🔗 
Sat 8:30 a.m.  8:50 a.m.

Topological Data Analisys with GUDHI and scalable manifold learning and clustering with megaman
(
Tutorial
)
link »
Presentation and demo of the Gudhi library for Topological Data Analysis, followed by a presentation of the magaman package. The aim of the presentations will be give an introduction for beginners into the practical side of GDA, and to give an overview of the software capabilities. The presenters will leave ample time for questions and will be available during poster sessions for more detailed discussions and demos. http://gudhi.gforge.inria.fr/ http://github.com/mmp2/megaman 
Vincent Rouvreau · Marina Meila 🔗 
Sat 8:50 a.m.  9:20 a.m.

Introduction to the R package TDA
(
Tutorial
)
link »
We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus, and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement the methods discussed in Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh (2014), Chazal, Fasy, Lecci, Rinaldo, and Wasserman (2014c) and Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman (2014a). The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree. 
Jisu KIM 🔗 
Sat 9:20 a.m.  9:50 a.m.

Riemannian metric estimation and the problem of isometric embedding
(
Talk
)

Dominique PerraultJoncas 🔗 
Sat 9:50 a.m.  10:20 a.m.

Ordinal distance comparisons: from topology to geometry
(
Invited talk
)

Ulrike von Luxburg 🔗 
Sat 10:20 a.m.  10:50 a.m.

Cofee break

🔗 
Sat 10:50 a.m.  11:30 a.m.

Geometric Data Analysis software
(
Discussion
)
We invite the GDA community to discuss the goods, the bads and the ways forward in the software for GDA. [NOTE THE START TIME: 10 minutes before the official end of the break] 
🔗 
Sat 11:30 a.m.  11:55 a.m.

Poster session II
(
Poster
)

🔗 
Sat 2:00 p.m.  2:30 p.m.

Modalsets, and densitybased Clustering
(
Talk
)
Modes or Modalsets are points or regions of space where the underlying data density is locallymaximal. They are relevant in problems such as clustering, outlier detection, or can simply serve to identify salient structures in highdimensional data (e.g. pointcloud data from medical imaging, astronomy, etc). In this talk we will argue that modalsets, as general extremal surfaces, yield more stable clustering than usual modes (extremal points) of a density. For one, modalsets can be consistently estimated, at nontrivial convergence rates, despite the added complexity of unknown surfaceshape and dimension. Furthermore, modalsets neatly dovetail into existing approaches that cluster data around pointmodes, yielding competitive, yet more stable clustering on a range of realworld problems. 
Samory Kpotufe 🔗 
Sat 2:30 p.m.  3:00 p.m.

A Note on Community Trees in Networks
(
Talk
)
We introduce the concept of community trees that summarizes topological structures within a network. A community tree is a tree structure representing clique communities from the clique percolation method (CPM). The community tree also generates a persistent diagram. Community trees and persistent diagrams reveal topological structures of the underlying networks and can be used as visualization tools. We study the stability of community trees and derive a quantity called the total star number (TSN) that presents an upper bound on the change of community trees. Our findings provide a topological interpretation for the stability of communities generated by the CPM. 
YenChi Chen 🔗 
Sat 3:30 p.m.  4:00 p.m.

Beyond Twosampletests: Localizing Data Discrepancies in Highdimensional Spaces
(
Talk
)
https://hal.inria.fr/hal01159235/document 
Frederic Cazals 🔗 
Author Information
Marina Meila (University of Washington)
Frederic Chazal (INRIA)
YuChia Chen (University of Washington)
More from the Same Authors

2021 Oral: The decomposition of the higherorder homology embedding constructed from the $k$Laplacian »
YuChia Chen · Marina Meila 
2021 Poster: The decomposition of the higherorder homology embedding constructed from the $k$Laplacian »
YuChia Chen · Marina Meila 
2020 : Closing Remarks »
Frederic Chazal · Smita Krishnaswamy · Roland Kwitt · Karthikeyan Natesan Ramamurthy · Bastian Rieck · Yuhei Umeda · Guy Wolf 
2020 Workshop: Topological Data Analysis and Beyond »
Bastian Rieck · Frederic Chazal · Smita Krishnaswamy · Roland Kwitt · Karthikeyan Natesan Ramamurthy · Yuhei Umeda · Guy Wolf 
2020 : Opening Remarks »
Frederic Chazal · Smita Krishnaswamy · Roland Kwitt · Karthikeyan Natesan Ramamurthy · Bastian Rieck · Yuhei Umeda · Guy Wolf 
2020 Poster: PLLay: Efficient Topological Layer based on Persistent Landscapes »
Kwangho Kim · Jisu Kim · Manzil Zaheer · Joon Kim · Frederic Chazal · Larry Wasserman 
2020 Session: Orals & Spotlights Track 27: Unsupervised/Probabilistic »
Marina Meila · Kun Zhang 
2019 Poster: Selecting the independent coordinates of manifolds with large aspect ratios »
YuChia Chen · Marina Meila 
2018 : Invited Talk 1 »
Marina Meila 
2018 Poster: How to tell when a clustering is (approximately) correct using convex relaxations »
Marina Meila 
2017 : Topological Data Analisys with GUDHI and scalable manifold learning and clustering with megaman »
Vincent Rouvreau · Marina Meila 
2017 : Discussion: Geometric Data Analysis »
Frederic Chazal · Marina Meila 
2017 Poster: Improved Graph Laplacian via Geometric SelfConsistency »
Dominique PerraultJoncas · Marina Meila · James McQueen 
2016 Poster: Data driven estimation of LaplaceBeltrami operator »
Frederic Chazal · Ilaria Giulini · Bertrand Michel 
2016 Poster: Nearly Isometric Embedding by Relaxation »
James McQueen · Marina Meila · Dominique PerraultJoncas 
2016 Poster: Graph Clustering: Blockmodels and model free results »
Yali Wan · Marina Meila 
2015 Poster: A class of network models recoverable by spectral clustering »
Yali Wan · Marina Meila 
2014 Poster: Recursive Inversion Models for Permutations »
Christopher Meek · Marina Meila 
2011 Poster: Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplaciantype Operators »
Dominique C PerraultJoncas · Marina Meila 
2011 Spotlight: Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplaciantype Operators »
Dominique C PerraultJoncas · Marina Meila 
2009 Workshop: Learning with Orderings »
Tiberio Caetano · Carlos Guestrin · Jonathan Huang · Risi Kondor · Guy Lebanon · Marina Meila 
2007 Workshop: Topology Learning: New Challenges At the Crossing of Machine Learning, »
Michael Aupetit · Frederic Chazal · Gilles Gasso · David CohenSteiner · pierre gaillard