Skip to yearly menu bar Skip to main content


Poster

Improved Error Bounds for Tree Representations of Metric Spaces

Samir Chowdhury · Facundo Mémoli · Zane T Smith

Area 5+6+7+8 #27

Keywords: [ Clustering ] [ (Other) Unsupervised Learning Methods ] [ Nonlinear Dimension Reduction and Manifold Learning ]


Abstract:

Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov's method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown.

Live content is unavailable. Log in and register to view live content