Show simple item record

dc.contributor.authorYeung, Ka Yeeen_US
dc.date.accessioned2009-10-06T16:58:10Z
dc.date.available2009-10-06T16:58:10Z
dc.date.issued2001en_US
dc.identifier.otherb47487811en_US
dc.identifier.other50489552en_US
dc.identifier.otherThesis 51133en_US
dc.identifier.urihttp://hdl.handle.net/1773/6986
dc.descriptionThesis (Ph. D.)--University of Washington, 2001en_US
dc.description.abstractThe invention of DNA microarrays allows us to study simultaneous variations of genes at the genome-wide scale. A typical gene expression data set consists of thousands or even tens of thousands of genes, and a few dozens experiments. Cluster analysis is the art of finding groups in a given data set such that objects in the same group are similar to each other while objects in different groups are dissimilar. There are many applications for clustering gene expression data.Many different clustering algorithms and analytical techniques have been applied to gene expression data. Success of various analytical methodologies in specific instances has been reported, but extensive quantitative evaluations of clustering methodologies are rare. Since different analytical approaches may produce different clustering results, there is a great need to evaluate clustering techniques in order to choose an appropriate approach. An underlying theme of this dissertation is systematic evaluations of clustering methodologies on gene expression data. Specifically, we proposed a data-driven methodology, called the figure of merit (FOM) methodology, to compare the quality of clusters from heuristic-based clustering algorithms. We also showed that the model-based clustering approach, which assumes the Gaussian mixture model, produces relatively high quality clusters. The probabilistic framework in the model-based approach allows us to infer the correct number of clusters, and to compare different models. Moreover, we investigated the effectiveness of a dimension reduction technique called principal component analysis as a pre-processing step before cluster analysis.Our main contributions are evaluation methodologies of analytical techniques in clustering gene expression data. We employed an external validation approach, which evaluates clustering results by comparing to external prior knowledge of the data, to assess the performance of internal validation approaches, which do not require any external knowledge of the data. In particular, we showed that our FOM methodology and the model-based approach, which do not require any external knowledge of the data, produce comparisons of clustering algorithms that are consistent with comparisons to external knowledge. Since external knowledge is seldom available for gene expression data, our work provides practical evaluation frameworks for assessing clustering results on gene expression data.en_US
dc.format.extentx, 144 p.en_US
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.rights.urien_US
dc.subject.otherTheses--Computer science and engineeringen_US
dc.titleCluster analysis of gene expression dataen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record