Explainable Machine Learning and Applications in Protein-Ligand Complex Structure Prediction

dc.contributor.advisorBaker, David
dc.contributor.authorSturmfels, Pascal
dc.date.accessioned2024-09-09T23:06:25Z
dc.date.available2024-09-09T23:06:25Z
dc.date.issued2024-09-09
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2024
dc.description.abstractThis thesis touches upon two main topics: interpreting machine learning models, and the application of machine learning to protein sequences and structures. The first portion deals with feature attribution techniques, which attribute attribution scores on a per-instance basis to a machine learning model that represent that model locally around that instance as linear. Three methods are proposed: expected gradients, attribution priors, and integrated hessians, that extend interpretability beyond feature attribution towards feature interaction and training more interpretable models. The second portion deals with training protein language models and how best to design semi-supervised pre-training tasks. It takes inspiration from multiple sequence alignments to propose two tasks - profile prediction and seq2msa - that extend language modeling beyond autoregressive and masked language modeling. The third portion deals applications in protein structure prediction, chiefly with predicting the structure of proteins in concert with small molecules. Jointly determining both the structure of a protein from its sequence input and how small molecule binding partners dock to that structure remains an open and challenging problem, and has applications in biological discovery, virtual screening, and de-novo design. This thesis discusses the development of a structure prediction network, RoseTTAFold All-Atom, capable of simultaneous folding and docking, as well as some applications enabled by that existing network.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherSturmfels_washington_0250E_26746.pdf
dc.identifier.urihttps://hdl.handle.net/1773/51872
dc.language.isoen_US
dc.rightsCC BY
dc.subjectComputer science
dc.subjectBiochemistry
dc.subjectBioinformatics
dc.subject.otherComputer science and engineering
dc.titleExplainable Machine Learning and Applications in Protein-Ligand Complex Structure Prediction
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sturmfels_washington_0250E_26746.pdf
Size:
5.88 MB
Format:
Adobe Portable Document Format