Motif-based mining of protein sequences

Liu, Agatha H

Motif-based mining of protein sequences

dc.contributor.author	Liu, Agatha H	en_US
dc.date.accessioned	2009-10-06T16:50:27Z
dc.date.available	2009-10-06T16:50:27Z
dc.date.issued	2002	en_US
dc.description	Thesis (Ph. D.)--University of Washington, 2002	en_US
dc.description.abstract	We introduce CASTOR, an automatic, unsupervised system for protein motif discovery and classification. Given amino acid sequences for a group of proteins, CASTOR generates statistically significant motifs and constructs a classification of the proteins by performing motif discovery and refinement in a top-down and recursive manner. The members of each class are likely to share a function, and the motifs associated with the class are likely to account for the function.We evaluate CASTOR's performance on the G protein-coupled receptor (GPCR) superfamily. The results show that the CASTOR-constructed classification is in better agreement with a manually curated classification than one constructed by another automatic, unsupervised system based on pairwise, global sequence similarity. Furthermore, while manually constructed classifications tend to be hierarchical, the CASTOR-constructed ones that are non-hierarchical suggest that complex functional relationships among classes may be more abundant than expected.We also apply CASTOR to the mammalian olfactory receptor family, for which very little functional information is available. We infer the potential functional roles associated with the generated motifs and classes by integrating various complex data, such as mutation experiments and ligand binding assays. Among other functional insights gained, we obtain results that support previous hypotheses on structural integrity and post-translational modification. We also propose and provide evidence for a combinatorial molecular mechanism that supports and potentially explains the ligand binding behavior. We additionally define sub-sequences that capture structural features of these receptors and study the motifs present in the sub-sequences.Finally, we introduce CASTOR+, an automatic, supervised system for protein classification. CASTOR+ adds new proteins to a pre-existing classification where each class is associated with specific motifs, such as that generated by CASTOR, by matching selected motifs in the given classification against each new protein. We evaluate the performance of CASTOR+ on the GPCR superfamily. We find that it performs almost as well as an approach based on pairwise, global sequence similarity in terms of classifying proteins against the bottom level of the manually curated classification. Furthermore, it often succeeds even as the other approach fails when the new proteins have no close homologues in the pre-existing classification.	en_US
dc.format.extent	ix, 140 p.	en_US
dc.identifier.other	b48359944	en_US
dc.identifier.other	51279631	en_US
dc.identifier.other	Thesis 51524	en_US
dc.identifier.uri	http://hdl.handle.net/1773/6894
dc.language.iso	en_US	en_US
dc.rights	Copyright is held by the individual authors.	en_US
dc.rights.uri		en_US
dc.subject.other	Theses--Computer science and engineering	en_US
dc.title	Motif-based mining of protein sequences	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3053531.pdf
Size:: 6.22 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering