Timezone: »

Statistical Inference for Cluster Trees
Jisu KIM · Yen-Chi Chen · Sivaraman Balakrishnan · Alessandro Rinaldo · Larry Wasserman

Tue Dec 06 09:00 AM -- 12:30 PM (PST) @ Area 5+6+7+8 #51

A cluster tree provides an intuitive summary of a density function that reveals essential structure about the high-density clusters. The true cluster tree is estimated from a finite sample from an unknown true density. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of different features of an empirical cluster tree. We first study a variety of metrics that can be used to compare different trees, analyzing their properties and assessing their suitability for our inference task. We then propose methods to construct and summarize confidence sets for the unknown true cluster tree. We introduce a partial ordering on cluster trees which we use to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, we provide a variety of simulations to illustrate our proposed methods and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (GvHD) data set.

Author Information

Jisu KIM (Carnegie Mellon University)
Yen-Chi Chen (Carnegie Mellon University)
Sivaraman Balakrishnan (Carnegie Mellon University)
Alessandro Rinaldo (Carnegie Mellon University)
Larry Wasserman (Carnegie Mellon University)

More from the Same Authors