DiMaio, FrankBaker, DavidMcHugh, Lilian2025-10-022025-10-022025-10-022025McHugh_washington_0250E_28910.pdfhttps://hdl.handle.net/1773/53916Thesis (Ph.D.)--University of Washington, 2025Sequence-specific DNA-binding proteins (DBPs) perform critical roles in biology and biotechnology, and have seen decades of effort to engineer, predict, and understand their functions. In this work, I present methods to design novel DBPs, predict the structures of protein-DNA complexes, and predict the binding specificities of structurally diverse DBPs. Made with custom computational methods, we screened over 100,000 designed DBPs and identified 44 that bound their intended targets with high affinity. Several of the designed DBPs are highly specific for their targets, as demonstrated by all-by-all cross-reactivity studies, mutation-scanning competition assays, and protein-binding microarrays. The designed DBPs bind consistently with their design models, as determined via interface ablation studies and crystallographic structure determination. For structure prediction, I developed and tested RosettaFold-NA (RFNA), the first end-to-end trained machine learning model that predicts the structures of any combination of protein and nucleic acids. RFNA accurately predicts about 30% of protein-nucleic acid complexes without sequence homology to the training set. For binding prediction, I curated a dataset of over 3000 DBPs with semi-manually-assigned DNA-binding domains and hundreds of thousands of corresponding experimentally-verified DNA target sequences. I fine-tuned RFNA and RoseTTAFold-allatom both on prediction of a binary binding / non-binding classification task and on prediction of distilled protein-DNA complex structures. In a retrospective analysis of design results, the resulting fine-tuned model is able to enrich for functional designed DBPs. Using a simulated annealing inference approach, the fine-tuned model can also predict DNA-binding profiles for validation set transcription factors with reasonable accuracy and efficiency.application/pdfen-USCC BY-NC-SADNA-bindingMachine learningProtein functionStructure predictionBiochemistryArtificial intelligenceBiological chemistrySequence-Specific DNA-binding Proteins: Protein Design, Structure Prediction, and Binding PredictionThesis