Sequence-Specific DNA-binding Proteins: Protein Design, Structure Prediction, and Binding Prediction

dc.contributor.advisorDiMaio, Frank
dc.contributor.advisorBaker, David
dc.contributor.authorMcHugh, Lilian
dc.date.accessioned2025-10-02T16:05:09Z
dc.date.available2025-10-02T16:05:09Z
dc.date.issued2025-10-02
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractSequence-specific DNA-binding proteins (DBPs) perform critical roles in biology and biotechnology, and have seen decades of effort to engineer, predict, and understand their functions. In this work, I present methods to design novel DBPs, predict the structures of protein-DNA complexes, and predict the binding specificities of structurally diverse DBPs. Made with custom computational methods, we screened over 100,000 designed DBPs and identified 44 that bound their intended targets with high affinity. Several of the designed DBPs are highly specific for their targets, as demonstrated by all-by-all cross-reactivity studies, mutation-scanning competition assays, and protein-binding microarrays. The designed DBPs bind consistently with their design models, as determined via interface ablation studies and crystallographic structure determination. For structure prediction, I developed and tested RosettaFold-NA (RFNA), the first end-to-end trained machine learning model that predicts the structures of any combination of protein and nucleic acids. RFNA accurately predicts about 30% of protein-nucleic acid complexes without sequence homology to the training set. For binding prediction, I curated a dataset of over 3000 DBPs with semi-manually-assigned DNA-binding domains and hundreds of thousands of corresponding experimentally-verified DNA target sequences. I fine-tuned RFNA and RoseTTAFold-allatom both on prediction of a binary binding / non-binding classification task and on prediction of distilled protein-DNA complex structures. In a retrospective analysis of design results, the resulting fine-tuned model is able to enrich for functional designed DBPs. Using a simulated annealing inference approach, the fine-tuned model can also predict DNA-binding profiles for validation set transcription factors with reasonable accuracy and efficiency.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherMcHugh_washington_0250E_28910.pdf
dc.identifier.urihttps://hdl.handle.net/1773/53916
dc.language.isoen_US
dc.rightsCC BY-NC-SA
dc.subjectDNA-binding
dc.subjectMachine learning
dc.subjectProtein function
dc.subjectStructure prediction
dc.subjectBiochemistry
dc.subjectArtificial intelligence
dc.subject.otherBiological chemistry
dc.titleSequence-Specific DNA-binding Proteins: Protein Design, Structure Prediction, and Binding Prediction
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
McHugh_washington_0250E_28910.pdf
Size:
6.2 MB
Format:
Adobe Portable Document Format