Timezone: »
Persistence diagrams (PDs) are now routinely used to summarize the underlying topology of complex data. Despite several appealing properties, incorporating PDs in learning pipelines can be challenging because their natural geometry is not Hilbertian. Indeed, this was recently exemplified in a string of papers which show that the simple task of averaging a few PDs can be computationally prohibitive. We propose in this article a tractable framework to carry out standard tasks on PDs at scale, notably evaluating distances, estimating barycenters and performing clustering. This framework builds upon a reformulation of PD metrics as optimal transport (OT) problems. Doing so, we can exploit recent computational advances: the OT problem on a planar grid, when regularized with entropy, is convex can be solved in linear time using the Sinkhorn algorithm and convolutions. This results in scalable computations that can stream on GPUs. We demonstrate the efficiency of our approach by carrying out clustering with diagrams metrics on several thousands of PDs, a scale never seen before in the literature.
Author Information
Theo Lacombe (Inria Saclay)
Marco Cuturi (Google Brain & CREST - ENSAE)
Marco Cuturi is a research scientist at Apple, in Paris. He received his Ph.D. in 11/2005 from the Ecole des Mines de Paris in applied mathematics. Before that he graduated from National School of Statistics (ENSAE) with a master degree (MVA) from ENS Cachan. He worked as a post-doctoral researcher at the Institute of Statistical Mathematics, Tokyo, between 11/2005 and 3/2007 and then in the financial industry between 4/2007 and 9/2008. After working at the ORFE department of Princeton University as a lecturer between 2/2009 and 8/2010, he was at the Graduate School of Informatics of Kyoto University between 9/2010 and 9/2016 as a tenured associate professor. He joined ENSAE in 9/2016 as a professor, where he is now working part-time. He was at Google between 10/2018 and 1/2022. His main employment is now with Apple, since 1/2022, as a research scientist working on fundamental aspects of machine learning.
Steve OUDOT (inria)
More from the Same Authors
-
2021 : Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi -
2021 : Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi -
2022 Poster: Supervised Training of Conditional Monge Maps »
Charlotte Bunne · Andreas Krause · Marco Cuturi -
2022 Poster: Efficient and Modular Implicit Differentiation »
Mathieu Blondel · Quentin Berthet · Marco Cuturi · Roy Frostig · Stephan Hoyer · Felipe Llinares-Lopez · Fabian Pedregosa · Jean-Philippe Vert -
2022 Poster: Low-rank Optimal Transport: Approximation, Statistics and Debiasing »
Meyer Scetbon · Marco Cuturi -
2021 Workshop: Optimal Transport and Machine Learning »
Jason Altschuler · Charlotte Bunne · Laetitia Chapel · Marco Cuturi · Rémi Flamary · Gabriel Peyré · Alexandra Suvorikova -
2020 Poster: Projection Robust Wasserstein Distance and Riemannian Optimization »
Tianyi Lin · Chenyou Fan · Nhat Ho · Marco Cuturi · Michael Jordan -
2020 Poster: Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm »
Tianyi Lin · Nhat Ho · Xi Chen · Marco Cuturi · Michael Jordan -
2020 Spotlight: Projection Robust Wasserstein Distance and Riemannian Optimization »
Tianyi Lin · Chenyou Fan · Nhat Ho · Marco Cuturi · Michael Jordan -
2020 Poster: Learning with Differentiable Pertubed Optimizers »
Quentin Berthet · Mathieu Blondel · Olivier Teboul · Marco Cuturi · Jean-Philippe Vert · Francis Bach -
2020 Poster: Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form »
Hicham Janati · Boris Muzellec · Gabriel Peyré · Marco Cuturi -
2020 Poster: Linear Time Sinkhorn Divergences using Positive Features »
Meyer Scetbon · Marco Cuturi -
2020 Oral: Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form »
Hicham Janati · Boris Muzellec · Gabriel Peyré · Marco Cuturi -
2020 Session: Orals & Spotlights Track 21: Optimization »
Peter Richtarik · Marco Cuturi -
2019 Workshop: Optimal Transport for Machine Learning »
Marco Cuturi · Gabriel Peyré · Rémi Flamary · Alexandra Suvorikova -
2019 Poster: Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections »
Boris Muzellec · Marco Cuturi -
2019 Poster: Differentiable Ranking and Sorting using Optimal Transport »
Marco Cuturi · Olivier Teboul · Jean-Philippe Vert -
2019 Spotlight: Differentiable Ranking and Sorting using Optimal Transport »
Marco Cuturi · Olivier Teboul · Jean-Philippe Vert -
2019 Poster: Tree-Sliced Variants of Wasserstein Distances »
Tam Le · Makoto Yamada · Kenji Fukumizu · Marco Cuturi -
2018 Poster: Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions »
Boris Muzellec · Marco Cuturi -
2017 Workshop: Optimal Transport and Machine Learning »
Olivier Bousquet · Marco Cuturi · Gabriel Peyré · Fei Sha · Justin Solomon -
2017 Tutorial: A Primer on Optimal Transport »
Marco Cuturi · Justin Solomon -
2016 Workshop: Time Series Workshop »
Oren Anava · Marco Cuturi · Azadeh Khaleghi · Vitaly Kuznetsov · Sasha Rakhlin -
2016 Poster: Wasserstein Training of Restricted Boltzmann Machines »
Grégoire Montavon · Klaus-Robert Müller · Marco Cuturi -
2016 Poster: Stochastic Optimization for Large-scale Optimal Transport »
Aude Genevay · Marco Cuturi · Gabriel Peyré · Francis Bach -
2015 Poster: Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric »
Vivien Seguy · Marco Cuturi -
2014 Workshop: Optimal Transport and Machine Learning »
Marco Cuturi · Gabriel Peyré · Justin Solomon · Alexander Barvinok · Piotr Indyk · Robert McCann · Adam Oberman -
2013 Poster: Sinkhorn Distances: Lightspeed Computation of Optimal Transport »
Marco Cuturi -
2013 Spotlight: Sinkhorn Distances: Lightspeed Computation of Optimal Transport »
Marco Cuturi -
2009 Poster: White Functionals for Anomaly Detection in Dynamical Systems »
Marco Cuturi · Jean-Philippe Vert · Alexandre d'Aspremont -
2006 Poster: Kernels on Structured Objects Through Nested Histograms »
Marco Cuturi · Kenji Fukumizu