Motif-based mining of protein sequences

dc.contributor.authorLiu, Agatha Hen_US
dc.date.accessioned2009-10-06T16:50:27Z
dc.date.available2009-10-06T16:50:27Z
dc.date.issued2002en_US
dc.descriptionThesis (Ph. D.)--University of Washington, 2002en_US
dc.description.abstractWe introduce CASTOR, an automatic, unsupervised system for protein motif discovery and classification. Given amino acid sequences for a group of proteins, CASTOR generates statistically significant motifs and constructs a classification of the proteins by performing motif discovery and refinement in a top-down and recursive manner. The members of each class are likely to share a function, and the motifs associated with the class are likely to account for the function.We evaluate CASTOR's performance on the G protein-coupled receptor (GPCR) superfamily. The results show that the CASTOR-constructed classification is in better agreement with a manually curated classification than one constructed by another automatic, unsupervised system based on pairwise, global sequence similarity. Furthermore, while manually constructed classifications tend to be hierarchical, the CASTOR-constructed ones that are non-hierarchical suggest that complex functional relationships among classes may be more abundant than expected.We also apply CASTOR to the mammalian olfactory receptor family, for which very little functional information is available. We infer the potential functional roles associated with the generated motifs and classes by integrating various complex data, such as mutation experiments and ligand binding assays. Among other functional insights gained, we obtain results that support previous hypotheses on structural integrity and post-translational modification. We also propose and provide evidence for a combinatorial molecular mechanism that supports and potentially explains the ligand binding behavior. We additionally define sub-sequences that capture structural features of these receptors and study the motifs present in the sub-sequences.Finally, we introduce CASTOR+, an automatic, supervised system for protein classification. CASTOR+ adds new proteins to a pre-existing classification where each class is associated with specific motifs, such as that generated by CASTOR, by matching selected motifs in the given classification against each new protein. We evaluate the performance of CASTOR+ on the GPCR superfamily. We find that it performs almost as well as an approach based on pairwise, global sequence similarity in terms of classifying proteins against the bottom level of the manually curated classification. Furthermore, it often succeeds even as the other approach fails when the new proteins have no close homologues in the pre-existing classification.en_US
dc.format.extentix, 140 p.en_US
dc.identifier.otherb48359944en_US
dc.identifier.other51279631en_US
dc.identifier.otherThesis 51524en_US
dc.identifier.urihttp://hdl.handle.net/1773/6894
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.rights.urien_US
dc.subject.otherTheses--Computer science and engineeringen_US
dc.titleMotif-based mining of protein sequencesen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3053531.pdf
Size:
6.22 MB
Format:
Adobe Portable Document Format