Translating mass spectra to peptides with deep learning

dc.contributor.advisorNoble, William S
dc.contributor.authorYILMAZ, Melih
dc.date.accessioned2025-05-12T22:46:25Z
dc.date.available2025-05-12T22:46:25Z
dc.date.issued2025-05-12
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractTandem mass spectrometry is the leading technique to study proteins at scale, and a fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information—de novo peptide sequencing—is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. In this work, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. Casanovo is trained on a repository-scale dataset and it significantly advances the state-of-the-art in de novo peptide sequencing. We show that Casanovo's superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome. Finally, we go beyond the de novo peptide sequencing problem and demonstrate Casanovo's capabilities as a foundation model in mass spectrometry proteomics.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherYILMAZ_washington_0250E_27822.pdf
dc.identifier.urihttps://hdl.handle.net/1773/52956
dc.language.isoen_US
dc.rightsnone
dc.subjectDe novo peptide sequencing
dc.subjectMachine Learning
dc.subjectMass Spectrometry
dc.subjectProteomics
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleTranslating mass spectra to peptides with deep learning
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
YILMAZ_washington_0250E_27822.pdf
Size:
13.14 MB
Format:
Adobe Portable Document Format