Timezone: »
Many clustering techniques aim at optimizing empirical criteria that are of the form of a U-statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U-processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk is proved to be of the order O(1/\sqrt{n}). Based on recent results related to the tail behavior of degenerate U-processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered.
Author Information
Stéphan Clémençon (Telecom ParisTech)
Related Events (a corresponding poster, oral, or spotlight)
-
2011 Spotlight: On U-processes and clustering performance »
Tue. Dec 13th 12:54 -- 12:58 PM Room
More from the Same Authors
-
2022 : Assessing Performance and Fairness Metrics in Face Recognition - Bootstrap Methods »
Jean-Rémy Conti · Stéphan Clémençon -
2017 Poster: Ranking Data with Continuous Labels through Oriented Recursive Partitions »
Stéphan Clémençon · Mastane Achab -
2015 Poster: SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk »
Guillaume Papa · Stéphan Clémençon · Aurélien Bellet -
2015 Poster: Extending Gossip Algorithms to Distributed Estimation of U-statistics »
Igor Colin · Aurélien Bellet · Joseph Salmon · Stéphan Clémençon -
2015 Spotlight: Extending Gossip Algorithms to Distributed Estimation of U-statistics »
Igor Colin · Aurélien Bellet · Joseph Salmon · Stéphan Clémençon -
2009 Poster: AUC optimization and the two-sample problem »
Stéphan Clémençon · Nicolas Vayatis · Marine Depecker