Powering phosphoproteomics with large scale data analysis and machine learning

Barente, Anthony Scott

Powering phosphoproteomics with large scale data analysis and machine learning

dc.contributor.advisor	Villén, Judit
dc.contributor.author	Barente, Anthony Scott
dc.date.accessioned	2022-07-14T22:11:50Z
dc.date.available	2022-07-14T22:11:50Z
dc.date.issued	2022-07-14
dc.date.submitted	2022
dc.description	Thesis (Ph.D.)--University of Washington, 2022
dc.description.abstract	Cells are the fundamental biological units of organisms and are constantly changing their internal state in response to external stimuli and stresses. A common way in which they do this is through the addition and subtraction of chemical tags from proteins, which allows the cells to exert fine grained control over protein activity. One of these tags, phosphorylation, is unique for its essential role in signaling cascades. By linking together chains of proteins turning on and off each other through phosphorylation, cells can build sophisticated networks capable of transforming stimuli into the appropriate biological response. High throughput tools such as mass spectrometry are ideal for studying phosphorylation, as they provide the capability to track the dynamics of thousands of modified sites across treatments. In recent years, this technique has only become more popular, with the number of submissions to public repositories for mass spectrometry data growing every month. By bringing together multiple phosphorylation studies into one dataset, we have the potential to learn fundamental properties about how phosphopeptides behave across instruments, and improve our assays. In addition to the amount of data, phosphoproteomics datasets have continued to grow in size with the improvement of sample preparation and data acquisition technologies. While this growth allows for more conditions and subjects to be included in a single study, it comes along with fundamental computational and statistical challenges. Within this thesis, I will present two stories which explore these avenues of research. First, I will present the analysis of a large scale yeast phosphoproteomics perturbation screen. With this I will show how the comparison of phosphosite dynamics across multiple treatments can lead to prioritized targets for further research and provide valuable information about the regulatory relationship between phosphosites. After this analysis, I will present my efforts to build a centralized resource for building targeted phosphoproteomics assays. Here I will first present pyAscore, a versatile and fast python package for performing an essential step in phosphopeptide identification. Then, I will detail an automated and reproducible pipeline for integrating publicly available phosphoproteomics data into a centralized knowledgebase, Phosphopedia 2.0. Finally, I will present work to predict phosphopeptide retention time and charge state from amino acid sequence, which has allowed Phosphopedia 2.0 to move beyond detections and provide information about any phosphopeptide.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Barente_washington_0250E_24562.pdf
dc.identifier.uri	http://hdl.handle.net/1773/49028
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Big Data
dc.subject	Cell Signalling
dc.subject	Mass Spectrometry
dc.subject	Phosphorylation
dc.subject	Proteomics
dc.subject	Molecular biology
dc.subject	Computer science
dc.subject	Statistics
dc.subject.other	Genetics
dc.title	Powering phosphoproteomics with large scale data analysis and machine learning
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Barente_washington_0250E_24562.pdf
Size:: 10.02 MB
Format:: Adobe Portable Document Format

Download

Collections

Genetics