A resampling approach to clustering with confidence

ResearchWorks/Manakin Repository

Search ResearchWorks


Advanced Search

Browse

My Account

Statistics

Related Information

A resampling approach to clustering with confidence

Show full item record

Title: A resampling approach to clustering with confidence
Author: Chiam, Yuan
Abstract: We propose a method for estimating the number of groups in a data set. Our method is an extension of Generalized Single Linkage clustering (GSL) (Stuetzle and Nugent 2010), a nonparametric clustering method based on the premise that groups in the data correspond to modes of the underlying data density. GSL starts with a nonparametric density estimate. It recursively splits the data into high density regions separated by valleys. The leaves of the resulting cluster tree correspond to modes of the density estimate. The problem is that nonparametric density estimates tend to have spurious modes due to sampling variability, giving rise to spurious splits in the cluster tree. We propose a resampling method aimed at assessing the significance of splits and a way of constructing a cluster tree making only significant splits. The only parameter is the significance level. Our method can identify highly non-linear groups. Simulation experiments suggest that the method is very conservative, which may explain its low power.
Description: Thesis (Master's)--University of Washington, 2012
URI: http://hdl.handle.net/1773/20906
Author requested restriction: No embargo

Files in this item

Files Size Format View
Chiam_washington_0250O_10458.pdf 588.8Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record