Exploring the Trinity of Protein Science: Structure, Stability, and Function Through the Lens of Machine Learning
Date
relationships.isAuthorOf
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Machine learning and deep learning are revolutionizing protein science by enabling the prediction of complex, emergent biophysical properties. This thesis presents two novel computational models that leverage these technologies to predict protein thermostability and function, illustrating how they can serve as powerful hypothesis generators within iterative "design, build, test, and learn" cycles. Chapter 2 details NOMELT, a generative model trained as a neural machine translator between mesophilic and thermophilic protein domains, which uses a vast new dataset of homologous protein pairs to enhance the stability of generated thermophilic sequences. Chapter 3 introduces the PairProphet pipeline, which integrates diverse sequence and structural data to predict functional similarities between protein pairs with high accuracy, highlighting the importance of sequence-based features and the potential limitations of current structural analysis techniques. The thesis suggests that integrating ecological information and pangenomic analyses could further enhance the predictive power of these models, pointing to these approaches as promising areas for future research. This work contributes to a deeper understanding of protein behaviors under diverse environmental conditions and suggests pathways to more effectively design proteins for therapeutic and industrial applications.
Description
Thesis (Master's)--University of Washington, 2024
