Bioinformatics of proteomic tandem mass spectra: selection, characterization, and identification
Tandem mass spectrometry is a powerful technology for proteomics. Quadrupole ion traps can isolate ions of a particular peptide, fragment them through collision-induced dissociation, and catalog the fragment ions in tandem mass spectra. Database search algorithms such as Mascot and SEQUEST can then identify the peptides represented by a collection of these spectra. These spectra, however, have not been extensively characterized, leading to inaccuracies in the ways these algorithms model fragment ions. In this body of research, a new algorithm, "DTASelect," was created to summarize, filter, and compare the identifications produced by database search algorithms. The extent and significance of spectral similarity in proteomic collections was explored. A set of well-identified peptides was statistically characterized to demonstrate the impact of peptide sequence on fragmentation. This information led to the creation of a new fragmentation model, which made possible a new algorithm, "GutenTag," to identify peptides via an automated, accurate sequence tagging approach. Taken together, this research shows that more accurate models of fragmentation can both improve existing algorithms and make new classes of algorithms feasible.