A practically efficient graph-theoretic approach to protein identification in mass spectrometry
MetadataShow full item record
Tandem mass spectrometry has emerged as a powerful tool for the characterization of complex protein samples, an increasingly important problem in biology. The effort to efficiently and accurately perform inference on data from tandem mass spectrometry experiments has resulted in several statistical methods. This thesis rephrases existing methods using a common framework, categorizes them, and discusses them in detail. Each method is analyzed and evaluated by considering the nature of the approach and the outcome and methodology of published comparisons to other methods; the analysis of existing method comparisons is used to comment on the qualities and weaknesses, as well as the overall utility, of these methods. The analysis of existing methods is utilized to propose Fido, a novel Bayesian approach to protein identification; Fido is demonstrated to equal or surpass the state-of-the-art methods. Fido uses a simple generative model of the tandem mass spectrometry process, and employs graph transformations to perform inference efficiently. These graph transformations are then combined with formal graph-theoretic inference procedures to increase the efficiency of inference and facilitate inference on more complex graphs resulting from more sophisticated models of the tandem mass spectrometry process. Extensions of the simple Bayesian model, as well as new directions for the field, are proposed; these proposed changes will help formalize, unify, and improve upon qualitatively similar techniques that are employed by existing methods. A formalized approach improves the quality and reliability with which proteins can be identified in complex mixtures.
- Genetics