Powering phosphoproteomics with large scale data analysis and machine learning

dc.contributor.advisorVillén, Judit
dc.contributor.authorBarente, Anthony Scott
dc.date.accessioned2022-07-14T22:11:50Z
dc.date.available2022-07-14T22:11:50Z
dc.date.issued2022-07-14
dc.date.submitted2022
dc.descriptionThesis (Ph.D.)--University of Washington, 2022
dc.description.abstractCells are the fundamental biological units of organisms and are constantly changing their internal state in response to external stimuli and stresses. A common way in which they do this is through the addition and subtraction of chemical tags from proteins, which allows the cells to exert fine grained control over protein activity. One of these tags, phosphorylation, is unique for its essential role in signaling cascades. By linking together chains of proteins turning on and off each other through phosphorylation, cells can build sophisticated networks capable of transforming stimuli into the appropriate biological response. High throughput tools such as mass spectrometry are ideal for studying phosphorylation, as they provide the capability to track the dynamics of thousands of modified sites across treatments. In recent years, this technique has only become more popular, with the number of submissions to public repositories for mass spectrometry data growing every month. By bringing together multiple phosphorylation studies into one dataset, we have the potential to learn fundamental properties about how phosphopeptides behave across instruments, and improve our assays. In addition to the amount of data, phosphoproteomics datasets have continued to grow in size with the improvement of sample preparation and data acquisition technologies. While this growth allows for more conditions and subjects to be included in a single study, it comes along with fundamental computational and statistical challenges. Within this thesis, I will present two stories which explore these avenues of research. First, I will present the analysis of a large scale yeast phosphoproteomics perturbation screen. With this I will show how the comparison of phosphosite dynamics across multiple treatments can lead to prioritized targets for further research and provide valuable information about the regulatory relationship between phosphosites. After this analysis, I will present my efforts to build a centralized resource for building targeted phosphoproteomics assays. Here I will first present pyAscore, a versatile and fast python package for performing an essential step in phosphopeptide identification. Then, I will detail an automated and reproducible pipeline for integrating publicly available phosphoproteomics data into a centralized knowledgebase, Phosphopedia 2.0. Finally, I will present work to predict phosphopeptide retention time and charge state from amino acid sequence, which has allowed Phosphopedia 2.0 to move beyond detections and provide information about any phosphopeptide.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherBarente_washington_0250E_24562.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49028
dc.language.isoen_US
dc.rightsCC BY
dc.subjectBig Data
dc.subjectCell Signalling
dc.subjectMass Spectrometry
dc.subjectPhosphorylation
dc.subjectProteomics
dc.subjectMolecular biology
dc.subjectComputer science
dc.subjectStatistics
dc.subject.otherGenetics
dc.titlePowering phosphoproteomics with large scale data analysis and machine learning
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Barente_washington_0250E_24562.pdf
Size:
10.02 MB
Format:
Adobe Portable Document Format

Collections