`

Timezone: »

 
Workshop
Synergies in Geometric Data Analysis (TWO DAYS)
Marina Meila · Frederic Chazal · Yu-Chia Chen

Fri Dec 08 08:00 AM -- 06:30 PM (PST) @ 102 C

This two day workshop will bring together researchers from the various subdisciplines of Geometric Data Analysis, such as manifold learning, topological data analysis, shape analysis, will showcase recent progress in this field and will establish directions for future research. The focus will be on high dimensional and big data, and on mathematically founded methodology.


Specific aims
=============
One aim of this workshop is to build connections between Topological Data Analysis on one side and Manifold Learning on the other. This is starting to happen, after years of more or less separate evolution of the two fields. The moment has been reached when the mathematical, statistical and algorithmic foundations of both areas are mature enough -- it is now time to lay the foundations for joint topological and differential geometric understanding of data, and this workshop will expliecitly focus on this process.

The second aim is to bring GDA closer to real applications. We see the challenge of real problems and real data as a motivator for researchers to explore new research questions, to reframe and expand the existing theory, and to step out of their own sub-area. In particular, for people in GDA to see TDA and ML as one.

The impact of GDA in practice also depends on having scalable implementations of the most current results in theory. This workshop will showcase the GDA tools which achieve this and initiate a collective discussion about the tools that need to be built.

We intend this workshop to be a forum for researchers in all areas of Geometric Data Analysis. Trough the tutorials, we are reaching out to the wider NIPS audience, to the many potential users of of Geometric Data Analysis, to make them aware of the state of the art in GDA, and of the tools available. Last but not least, we hope that the scientists invited will bring these methods back to their communities.

Fri 8:10 a.m. - 9:10 a.m.

We introduce a new algorithm, called CDER, for supervised machine learning that merges the multi-scale geometric properties of Cover Trees with the information-theoretic properties of entropy. CDER applies to a training set of labeled pointclouds embedded in a common Euclidean space. If typical pointclouds corresponding to distinct labels tend to differ at any scale in any sub-region, CDER can identify these differences in linear time, creating a set of distributional coordinates which act as a feature extraction mechanism for supervised learning. We describe the use of CDER both directly on point clouds and on persistence diagrams.

John L Harer
Fri 9:10 a.m. - 9:40 a.m.

Various problems in manifold estimation make use of a quantity called the reach, denoted by τM, which is a measure of the regularity of the manifold. This paper is the first investigation into the problem of how to estimate the reach. First, we study the geometry of the reach through an approximation perspective. We derive new geometric results on the reach for submanifolds without boundary. An estimator ˆτ of τM is proposed in a framework where tangent spaces are known, and bounds assessing its efficiency are derived. In the case of i.i.d. random point cloud X n , τˆ ( X n) is showed to achieve uniform expected loss bounds over a C3-like model. Finally, we obtain upper and lower bounds on the minimax rate for estimating the reach.

Eddie Aamari
Fri 9:40 a.m. - 10:10 a.m.
Multiscale geometric feature extraction (Talk)
Wolfgang Polonik
Fri 10:10 a.m. - 10:30 a.m.

Each poster presenter will have approximatively 1.5 minutes to advertise their posters. We encourage all poster presenters to put up their posters at the beginning of the workshop.

Fri 10:15 a.m. - 6:00 p.m.

Persistent homology is a mathematical formalism based on algebraic topology and is central to Topological Data Analysis (TDA). Its paradigm consists in estimating the topological features of a shape embedded in an Euclidean space from a point-cloud sampled from it. The estimation is done at multiple scales by reducing a, so-called, boundary matrix which is a sparse binary matrix that encodes a simplicial complex filtration built from the point-cloud. The reduction process is similar to Gaussian elimination and represents an important computational bottleneck in the pipeline. To improve the scalability of the TDA framework, several strategies to accelerate it have been proposed. Herein, we present a number of structural dependencies in boundary matrices and use them to design a novel parallel reduction algorithm. In particular, we show that this structural information: (i) makes part of the reduction immediately apparent, (ii) decreases the total number of column operations required for reduction, (iii) gives a framework for which the process can be massively parallelised. Simulations on synthetic examples show that the computational burden can be conducted in a small fraction of the number of iterations needed by traditional methods. Moreover, whereas the traditional methods reveal barcodes sequentially from a filtration order, this approach gives an alternative method by which barcodes are partly revealed for multiple scales simultaneously and further refined as the algorithm progresses. Specifically, our numerical experiments show that for a Vietoris-Rips filtration with 10^4 simplices, the essential topological information can be estimated with 95% precision in two iterations and that the reduction completed to within 1% in about ten iterations of our algorithm as opposed to nearly approximately eight thousand iterations for traditional methods.

Rodrigo Mendoza Smith
Fri 10:15 a.m. - 6:00 p.m.

We propose a novel formulation of the low-rank tensor completion problem that is based on the duality theory and a particular choice of low-rank regularizer. This low-rank regularizer along with the dual perspective provides a simple characterization of the solution to the tensor completion problem. Motivated by large-scale setting, we next derive a rank-constrained reformulation of the proposed optimization problem, which is shown to lie on the Riemannian spectrahedron manifold. We exploit the versatile Riemannian optimization framework to develop computationally efficient conjugate gradient and trust-region algorithms. The experiments confirm the benefits of our choice of regularization and the proposed algorithms outperform state-of-the-art algorithms on several real-world data sets in different applications.

Madhav Nimishakavi
Fri 10:15 a.m. - 6:00 p.m.
Maximum likelihood estimation of Riemannian metrics from Euclidean data (Poster)  link » Georgios Arvanitidis
Fri 10:30 a.m. - 11:00 a.m.
Coffee break (Break)
Fri 11:00 a.m. - 11:30 a.m.

When we observe a point cloud in the Euclidean space, the persistent homology of the upper level sets filtration of the density is one of the most important tools to understand topological features of the data generating distribution. The persistent homology of KDEs (kernel density estimators) for the density function is a natural way to estimate the target quantity. In practice, however, calculating the persistent homology of KDEs on d-dimensional Euclidean spaces requires to approximate the ambient space to a grid, which could be computationally inefficient when the dimension of the ambient space is high or topological features are in different scales. In this abstract, we consider the persistent homologies of KDE filtrations on Rips complexes as alternative estimators. We show consistency results for both the persistent homology of the upper level sets filtration of the density and its simplified version. We also describe a novel methodology to construct an asymptotic confidence set based on the bootstrap procedure. Unlike existing procedures, our method does not heavily rely on grid-approximations, scales to higher dimensions, and is adaptive to heterogeneous topological features.

Jaehyeok Shin, Alessandro Rinaldo
Fri 11:30 a.m. - 11:55 a.m.

(note: the talk is 30 mins, but the server has problems with 12:00 noon)

We examine a number of non-linear dimensionality reduction techniques including Laplacian Eigenmaps, LLE, MVU, HLLE, LTSA, and t-SNE. In each case we show that the non-linear embedding can be characterized by a Laplacian or Laplacian-like operator. By comparing the resulting operators, one can uncover the similarities and differences between the methods. For example, HLLE and LTSA can be shown to be asymptotically identical, and whilst maximum variance unfolding (MVU) can be shown to generate a Laplacian, the behavior of the Laplacian is completely different from that generated by Laplacian Eigenmaps. We discuss the implications of this characterization for generating new non-linear dimensionality reduction methods and smoothness penalties.

Daniel Ting
Fri 1:30 p.m. - 2:00 p.m.
Poster session I (Poster)
Fri 2:00 p.m. - 3:00 p.m.
Multiscale characterization of molecular dynamics (Invited talk)
Cecilia Clementi
Fri 3:30 p.m. - 4:00 p.m.

Lorin Crawford1,2,3,†, Anthea Monod4,†, Andrew X. Chen4, Sayan Mukherjee5,6,7,8, and Rau ́l Rabad ́an4

Lorin Crawford
Fri 4:00 p.m. - 4:30 p.m.
Consistent manifold representation for TDA (Talk)
Timothy Sauer
Fri 5:00 p.m. - 6:00 p.m.

One aim of this workshop is to build connections between Topological Data Analysis on one side and Manifold Learning on the other. The moment has been reached when the mathematical, statistical and algorithmic foundations of both areas are mature enough -- it is now time to lay the foundations for joint topological and differential geometric understanding of data, and this discussion will explicitly focus on this process.

The second aim is to bring GDA closer to real applications. We see the challenge of real problems and real data as a motivator for researchers to explore new research questions, to reframe and expand the existing theory, and to step out of their own sub-area.

Frederic Chazal, Marina Meila
Sat 8:30 a.m. - 8:50 a.m.
 link »

Presentation and demo of the Gudhi library for Topological Data Analysis, followed by a presentation of the magaman package.

The aim of the presentations will be give an introduction for beginners into the practical side of GDA, and to give an overview of the software capabilities. The presenters will leave ample time for questions and will be available during poster sessions for more detailed discussions and demos.

http://gudhi.gforge.inria.fr/ http://github.com/mmp2/megaman

Vincent Rouvreau, Marina Meila
Sat 8:50 a.m. - 9:20 a.m.
 link »

We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus, and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement the methods discussed in Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh (2014), Chazal, Fasy, Lecci, Rinaldo, and Wasserman (2014c) and Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman (2014a). The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree.

Jisu KIM
Sat 9:20 a.m. - 9:50 a.m.
Riemannian metric estimation and the problem of isometric embedding (Talk)
Dominique Perrault-Joncas
Sat 9:50 a.m. - 10:20 a.m.
Ordinal distance comparisons: from topology to geometry (Invited talk)
Ulrike von Luxburg
Sat 10:20 a.m. - 10:50 a.m.
Cofee break (Break)
Sat 10:50 a.m. - 11:30 a.m.

We invite the GDA community to discuss the goods, the bads and the ways forward in the software for GDA.

[NOTE THE START TIME: 10 minutes before the official end of the break]

Sat 11:30 a.m. - 11:55 a.m.
Poster session II (Poster)
Sat 2:00 p.m. - 2:30 p.m.

Modes or Modal-sets are points or regions of space where the underlying data density is locally-maximal. They are relevant in problems such as clustering, outlier detection, or can simply serve to identify salient structures in high-dimensional data (e.g. point-cloud data from medical imaging, astronomy, etc).

In this talk we will argue that modal-sets, as general extremal surfaces, yield more stable clustering than usual modes (extremal points) of a density. For one, modal-sets can be consistently estimated, at non-trivial convergence rates, despite the added complexity of unknown surface-shape and dimension. Furthermore, modal-sets neatly dovetail into existing approaches that cluster data around point-modes, yielding competitive, yet more stable clustering on a range of real-world problems.

Samory Kpotufe
Sat 2:30 p.m. - 3:00 p.m.

We introduce the concept of community trees that summarizes topological structures within a network. A community tree is a tree structure representing clique communities from the clique percolation method (CPM). The community tree also generates a persistent diagram. Community trees and persistent diagrams reveal topological structures of the underlying networks and can be used as visualization tools. We study the stability of community trees and derive a quantity called the total star number (TSN) that presents an upper bound on the change of community trees. Our findings provide a topological interpretation for the stability of communities generated by the CPM.

Yen-Chi Chen
Sat 3:30 p.m. - 4:00 p.m.

https://hal.inria.fr/hal-01159235/document

Frederic Cazals

Author Information

Marina Meila (University of Washington)
Frederic Chazal (INRIA)
Yu-Chia Chen (University of Washington)

More from the Same Authors