Human-specific duplicate genes: new frontiers for disease and evolution
Abstract
Gene duplication is a fundamental force contributing to the evolution of novel traits, genomic diversity among species and individuals, and disease. In this dissertation, I characterize the evolutionary history, diversity, functional potential, and disease relevance of gene families that emerged specifically along the lineage leading to human. I leveraged a haploid clone library to resolve the sequence and structure of four human SRGAP2 paralogs, adding ~380 kbp of sequence to the human reference genome. Analyzing this high-quality sequence, I found that the promoter and first nine exons of SRGAP2 duplicated three times across chromosome 1, ~3.4-1 million years ago. All paralogs produce mRNA transcripts, but SRGAP2C is most highly expressed and has fixed in copy number in the human population, making it the most likely functional duplicate. To screen large cohorts of autism and intellectual disability patients for mutations that disrupt SRGAP2C, I developed a method to genotype paralog-specific copy number and sequence variation using molecular inversion probes. I demonstrated that this method was broadly applicable to large-scale genotyping of previously inaccessible duplicated genes. Using this method, I also discovered regions of interlocus gene conversion between duplicated sequences >80 Mbp apart on the same chromosome and refined unequal crossover breakpoints for copy number polymorphisms at the RH locus. Finally, I employed my genotyping method and strategies used to characterize SRGAP2 duplications to study BOLA2, a gene duplicated specifically in Homo sapiens located at chromosome 16p11.2. Sequencing this region in orangutan and chimpanzee revealed drastic rearrangements between species, including six inversions affecting 47 genes. I determined that an ~95 kbp segment including BOLA2 duplicated ~282 thousand years ago, specifically predisposing humans to recurrent microdeletions and microduplications associated with autism. I demonstrate that despite its young age and its conferring susceptibility to rearrangements, the BOLA2 duplication has nearly fixed in the human lineage. I show that BOLA2 duplication resulted in a Homo sapiens-specific in-frame fusion transcript and that expression correlates with genomic copy number. Collectively, my work provides new insights into the birth, evolution, and disease relevance of duplicate genes, pioneers new genotyping technology, and identifies specific gene innovations as novel candidates for the evolution of uniquely human traits.
Collections
- Genetics [135]