Show simple item record

dc.contributor.advisorStuetzle, Werneren_US
dc.contributor.authorChiam, Yuanen_US
dc.date.accessioned2012-09-13T17:40:42Z
dc.date.available2012-09-13T17:40:42Z
dc.date.issued2012-09-13
dc.date.submitted2012en_US
dc.identifier.otherChiam_washington_0250O_10458.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1773/20906
dc.descriptionThesis (Master's)--University of Washington, 2012en_US
dc.description.abstractWe propose a method for estimating the number of groups in a data set. Our method is an extension of Generalized Single Linkage clustering (GSL) (Stuetzle and Nugent 2010), a nonparametric clustering method based on the premise that groups in the data correspond to modes of the underlying data density. GSL starts with a nonparametric density estimate. It recursively splits the data into high density regions separated by valleys. The leaves of the resulting cluster tree correspond to modes of the density estimate. The problem is that nonparametric density estimates tend to have spurious modes due to sampling variability, giving rise to spurious splits in the cluster tree. We propose a resampling method aimed at assessing the significance of splits and a way of constructing a cluster tree making only significant splits. The only parameter is the significance level. Our method can identify highly non-linear groups. Simulation experiments suggest that the method is very conservative, which may explain its low power.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.subject.otherStatisticsen_US
dc.subject.otherStatisticsen_US
dc.titleA resampling approach to clustering with confidenceen_US
dc.typeThesisen_US
dc.embargo.termsNo embargoen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record