Swanson, Willie JRivera, Alberto Marcos2023-01-212023-01-212023-01-212022Rivera_washington_0250E_24946.pdfhttp://hdl.handle.net/1773/49688Thesis (Ph.D.)--University of Washington, 2022Duplication processes such as whole gene duplication and tandem domain expansion are important for the evolution and functional diversification of essential protein families. While whole gene duplications are well established sources of new genes and biological novelty, less attention has been paid to how domain level duplications also allow proteins to neofunctionalize. Genes encoding fertilization proteins are also some of the most rapidly evolving in the genome, which could enable the neofunctionalization of duplicated domains within these genes. Chapter 1 of this dissertation reviews multiple well known gene families (Izumo, DCST, ZP, and the TFP superfamily) that arose from gene duplication. ZPs and TFPs also demonstrate tandem domain duplication and functional diversification. In chapter 2 of this dissertation, we present research into the evolutionary history of domain duplication and neofunctionalization within the Zona pellucida amino (ZP-N) terminal domain. A large scale phylogenetics analysis across vertebrates revealed a divide between two classes of ZP-N domains: those that are paired with the ZP-C domain in the terminal ZP module (modular), and those outside of this module (free). This suggests that there was an initial ZP-N duplication event in vertebrates which then produced a wide array of functional diverse ZP-N domains. Machine learning classification also reveals that modular domains are more conserved at the level of both sequence and structure. In contrast, free domains are more divergent and some in ZP2 show evidence of positive selection. While modular ZP-Ns domains may be conserved for a structural role, free ZP-Ns have experienced a history of duplications and neofunctionalization in fertilization. Chapter 3 of this dissertation outlines transcriptomic research in abalone ovaries. Much of this research has been motivated by earlier sperm proteomic work from the Swanson lab, as well as advancements in sequencing technology. The aim is to identify ovary expressed genes that play important roles in fertilization. We have identified multiple ZP proteins, that show homology with previously sequenced ZP pseudogenes. One of the newly described ZP proteins has a duplicated ZP-N domain, and phylogenetics suggests this occurred independently from vertebrate ZP-N expansions. This transcriptome analysis also identified five abalone ovary TFPs, which may have experienced structural modifications relative to published TFPs. Taken together, the research findings suggest a history of recurrent independent co-option, structural modification, and functional diversification of fertilization proteins. In chapter 4, we discuss several possible extensions of this research including more extensive positive selection analyses and co-evolutionary analyses of the transcriptome data.application/pdfen-USCC BYDuplicationEvolutionFertilizationMachine Learning ClassificationPhylogeneticsTranscriptomeGeneticsGeneticsInvestigating the duplication and evolution of essential fertilization proteinsThesis