A practically efficient graph-theoretic approach to protein identification in mass spectrometry
Abstract
Tandem mass spectrometry has emerged as a powerful tool for the characterization of complex
protein samples, an increasingly important problem in biology. The effort to efficiently and accurately
perform inference on data from tandem mass spectrometry experiments has resulted in
several statistical methods. This thesis rephrases existing methods using a common framework,
categorizes them, and discusses them in detail. Each method is analyzed and evaluated by
considering the nature of the approach and the outcome and methodology of published comparisons
to other methods; the analysis of existing method comparisons is used to comment on
the qualities and weaknesses, as well as the overall utility, of these methods. The analysis of
existing methods is utilized to propose Fido, a novel Bayesian approach to protein identification;
Fido is demonstrated to equal or surpass the state-of-the-art methods. Fido uses a simple
generative model of the tandem mass spectrometry process, and employs graph transformations
to perform inference efficiently. These graph transformations are then combined with formal
graph-theoretic inference procedures to increase the efficiency of inference and facilitate inference
on more complex graphs resulting from more sophisticated models of the tandem mass
spectrometry process. Extensions of the simple Bayesian model, as well as new directions for
the field, are proposed; these proposed changes will help formalize, unify, and improve upon
qualitatively similar techniques that are employed by existing methods. A formalized approach
improves the quality and reliability with which proteins can be identified in complex mixtures.
Collections
- Genetics [146]