Baker, DavidMansoor, Sanaa2023-08-142023-08-142023-08-142023Mansoor_washington_0250E_25237.pdfhttp://hdl.handle.net/1773/50189Thesis (Ph.D.)--University of Washington, 2023The structure and function of proteins are encoded by their amino acid sequences. The field of protein design aims to uncover the fundamental connection between protein sequence, structure, and function to design novel proteins with important applications in fields such as medicine, biotechnology, and materials science. The complex relationship between protein sequence, structure, and function makes protein design a challenging task. In recent years, learned embeddings have emerged as a powerful tool to help deconvolute this relationship. Learned embeddings can convert high-dimensional protein data, such as protein sequences and structures, into small vectors of biologically relevant information. By capturing all the essential features of a protein in a compact form, embeddings enable the use of machine learning techniques for protein design. My PhD research has focused on generating meaningful learned embeddings of proteins and then harnessing them for various downstream predictions. For studying protein ensembles and protein structure refinement, I developed embeddings through training generative models on two-dimensional structural data, followed by three-dimensional structural modeling. By incorporating sequence information, a joint representation of protein sequence and structure was developed for predicting the effects of single mutations on protein thermal stability. Finally, following the development and success of an accurate structure prediction model, RoseTTAFold, the embeddings learned from this model were used for “zero-shot” or unsupervised prediction of the effect of point mutations on protein stability and function. These successes demonstrate the importance of using learned protein embeddings for protein design and highlight the need for further research in this area to facilitate the creation of novel proteins with desired properties.application/pdfen-USCC BY-NC-SAdeep learningembeddingsprotein designComputer scienceBiochemistryMolecular engineeringGenerating and Harnessing Learned Embeddings for Protein DesignThesis