Eichler, Evan EGuitart, Xavi2026-04-202026-04-202026-04-202026Guitart_washington_0250E_29269.pdfhttps://hdl.handle.net/1773/55510Thesis (Ph.D.)--University of Washington, 2026Segmental duplications (SDs) are a major source of genomic innovation, responsible for the genomic instability that accelerates novel gene function but also causes disease. Despite their importance, SDs have historically remained difficult to study due to their high sequence identity and copy number polymorphism, making them all but impossible to resolve with short-read sequencing and assembly. Recent advances in long-read sequencing and de novo genome assembly have made it possible to resolve these regions at haplotype resolution, enabling systematic investigation of complex duplicated gene families. This thesis leverages these technological advances to study the evolution, structural diversity, and regulation of TBC1D3, a primate-specific SD gene family implicated in neuronal progenitor proliferation, cortical expansion, and cancer.TBC1D3 is a young and highly duplicated gene family dispersed across chromosome 17, with the majority of paralogs embedded in two large SD clusters at 17q12. Prior functional studies demonstrated that TBC1D3 promotes cellular proliferation in both cancer and neurodevelopmental contexts, yet it remained unclear how a gene family with extreme copy number variation could contribute to tightly regulated developmental processes. In this work, I address this paradox by integrating long-read genome assemblies, comparative primate genomics, population-scale human variation, and paralog-resolved transcriptomics. In Chapter 2, I reconstruct the evolutionary history and human diversity of TBC1D3 using haplotype-resolved assemblies from 69 human haplotypes and 11 nonhuman primate species. I show that TBC1D3 independently expanded in at least five primate lineages and that humans experienced a recent expansion approximately 2–3 million years ago. Human haplotypes exhibit extraordinary structural diversity, differing by up to ~1 Mbp and more than 20 gene copies, making TBC1D3 one of the most structurally variable gene families in the human genome. Despite this variability, signatures of positive selection are detected along the African ape lineage, and I show that all human-expressed copies share a derived, human-specific modification of the protein C terminus, suggesting functional divergence during recent human evolution. Using a pangenomic and phylogenetic framework, I define distinct paralog groups and demonstrate that TBC1D3 expression is overwhelmingly restricted to a single paralog group located at the telomeric end of cluster 2. Chapter 3 investigates the regulatory basis of this striking paralog-specific expression. I demonstrate that TBC1D3 expression in human neural contexts is driven by a position-effect mechanism in which a fixed, copy number-constrained promoter derived from the neighboring gene NPEPPSP1 has been duplicated and fused upstream of a specific TBC1D3 paralog. Using comparative epigenomics, long-read transcriptomics, and neuronal differentiation models, I show that this NPEPPSP1–TBC1D3 fusion creates a dominant regulatory architecture that restricts transcription to a single copy despite extensive underlying copy number variation. This mechanism provides a parsimonious explanation for how TBC1D3 expression and function may remain stable while the surrounding gene family continues to diversify structurally. In Chapter 4, I describe experimental efforts to interrogate the functional consequences of human-specific modifications to TBC1D3, including the derived C-terminal extension. Although these experiments were not ultimately successful, their outcomes inform hypotheses regarding protein localization, posttranslational regulation, and context-dependent function that motivate future work. This thesis establishes a generalizable framework for studying complex SD gene families by integrating haplotype-resolved assemblies, evolutionary analysis, and paralog-aware regulatory interrogation. The findings reveal how SDs contribute to gene family expansion, regulatory innovation, and protein evolution, allowing rapid structural diversification. More broadly, this work demonstrates how long-read genomics enables direct investigation of genomic regions that have played a disproportionate role in human evolution and disease yet have remained largely inaccessible until now.application/pdfen-USCC BYGeneticsGeneticsInnovation in Duplication: Structural Diversity and Regulatory Control of Human Genes: TBC1D3Thesis