ResearchWorks Archive

A resampling approach to clustering with confidence

Show simple item record

dc.contributor.advisor Stuetzle, Werner en_US Chiam, Yuan en_US 2012-09-13T17:40:42Z 2012-09-13T17:40:42Z 2012-09-13 2012 en_US
dc.identifier.other Chiam_washington_0250O_10458.pdf en_US
dc.description Thesis (Master's)--University of Washington, 2012 en_US
dc.description.abstract We propose a method for estimating the number of groups in a data set. Our method is an extension of Generalized Single Linkage clustering (GSL) (Stuetzle and Nugent 2010), a nonparametric clustering method based on the premise that groups in the data correspond to modes of the underlying data density. GSL starts with a nonparametric density estimate. It recursively splits the data into high density regions separated by valleys. The leaves of the resulting cluster tree correspond to modes of the density estimate. The problem is that nonparametric density estimates tend to have spurious modes due to sampling variability, giving rise to spurious splits in the cluster tree. We propose a resampling method aimed at assessing the significance of splits and a way of constructing a cluster tree making only significant splits. The only parameter is the significance level. Our method can identify highly non-linear groups. Simulation experiments suggest that the method is very conservative, which may explain its low power. en_US
dc.format.mimetype application/pdf en_US
dc.language.iso en_US en_US
dc.rights Copyright is held by the individual authors. en_US
dc.subject en_US
dc.subject.other Statistics en_US
dc.subject.other Statistics en_US
dc.title A resampling approach to clustering with confidence en_US
dc.type Thesis en_US
dc.embargo.terms No embargo en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ResearchWorks

Advanced Search


My Account