Structural Variation and Expression of Segmentally Duplicated Human Genes
| dc.contributor.advisor | Eichler, Evan | |
| dc.contributor.author | Dishuck, Philip | |
| dc.date.accessioned | 2025-01-23T20:09:17Z | |
| dc.date.issued | 2025-01-23 | |
| dc.date.submitted | 2024 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2024 | |
| dc.description.abstract | Gene duplication is a major driver of evolution, and the African ape lineage, including humans, experienced a burst of segmental duplications (SDs). Recent gene duplications help explain the rapid phenotypic changes in humans despite a slowdown in point mutations in primates. However, these genes are particularly difficult to study due to limitations in sequencing and assembly—the first complete human assembly, including all segmentally duplicated genes, was not finished until 2022. Long-read DNA sequencing (PacBio HiFi [high-fidelity] and ONT [Oxford Nanopore Technologies]) now enables the routine assembly of highly contiguous human genomes, and long-read cDNA sequencing (Iso-Seq) allows paralog-specific assessment of gene models and identification of isoforms. In this thesis, I analyze the gene duplications of some of the first human HiFi and ONT assemblies and use Iso-Seq to functionally annotate recent duplications. I characterized 170 highly contiguous human haplotypes containing 47 Mbp of additional SD content absent from the first complete reference assembly. Using Iso-Seq, I annotated the segmentally duplicated genes in these assemblies, discovering 201 new genes in copy number polymorphic gene families. These include a coding gene fusion NSFP1-LRRC37A2 in an inverted form of the MAPT (tau) locus, and a KRAB-zinc finger gene present in 36% haplotypes that has only 69% amino acid identity to the best-matching annotated human gene. To validate long-read assemblies, I created a method, called GAVISUNK, that uses the distance between singly unique nucleotide k-mers (SUNKs) in ultra-long ONT reads to validate the structure of HiFi assemblies. This method identifies structural errors in assemblies and allows confident downstream of analysis of structural variation, unbiased by assembly artefacts. I performed a detailed analysis of a high copy number gene family, NPIP, which displays signatures of positive selection on the human and African ape lineage. Of 28 named human paralogs, I found that just three are fixed at a single copy (NPIPB2, B11, and B14). I found evidence of ongoing gene duplications, deletions, interlocus gene conversion, and large inversions mediated by NPIP duplication blocks. Two paralogs (B9 and B15) were within the most extreme percentile of tests for positive selection and selective sweeps. Full-length cDNA from 101 tissue/cell types revealed distinct gene models for subgroups of NPIPs, including a variable number tandem repeat (VNTR) that encodes a variably sized beta helix. Paralogs in that subgroup show enriched expression in brain tissue, while others retain the ancestral testis-enriched expression. These analyses reveal mechanisms for rapid evolution of duplicated genes and demonstrate their polymorphism among humans. | |
| dc.embargo.lift | 2026-01-23T20:09:17Z | |
| dc.embargo.terms | Restrict to UW for 1 year -- then make Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Dishuck_washington_0250E_27753.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/52803 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | Genetics | |
| dc.subject | Bioinformatics | |
| dc.subject.other | Genetics | |
| dc.title | Structural Variation and Expression of Segmentally Duplicated Human Genes | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Dishuck_washington_0250E_27753.pdf
- Size:
- 16.57 MB
- Format:
- Adobe Portable Document Format
