New technologies for sequencing and interpreting genomes
Kitzman, Jacob Otto
MetadataShow full item record
A central goal of biomedical research is to catalog genetic variation and uncover its causal relationship to phenotype. As humans are diploid organisms with two copies of each chromosome (except for the sex chromosomes in males), a critical aspect of each individual's genetic variation is not only the full list of alleles he or she carries, but additionally the arrangement of those alleles (termed haplotypes) along each of the inherited chromosomes. Although current strategies for genome sequencing are highly effective at cataloging allelic variation, they are largely unable to link consecutive variant alleles onto longer haplotypes. This limitation arises from mechanical or enzymatic shearing used as one of the first steps by every sequencing technology in current use. By shearing input material into fragments much shorter than the average distance between adjacent heterozygous variants, the contiguity information encoded by the pairing of adjacent variants is lost. In this dissertation, I describe techniques that I developed to recover this contiguity information by preserving long fragments for several additional steps prior to sequencing. In a proof-of-concept, I show for an individual human genome that this approach can resolve haplotype phase with substantial completeness (94% of detected heterozygous variants) and across long physical spans (50% of the genome in blocks of ≥386 kbp). The ability to obtain haplotype-resolved genome sequences enables numerous important applications, of which I demonstrate several here. I show how this approach can be used to uncover novel aspects of genome structure, from genotyping variation in structurally complex, duplicated portions of the genome, to mapping positions of sequence not present in the human genome. Finally, to demonstrate the translational value of this information, I combined haplotype-resolved sequences of parental genomes with maternal plasma DNA sequencing in order to non-invasively predict the genome sequence of a human fetus in the second trimester. As sequencing technology continues to expand in scale, quality, and its scope of application, continued development of techniques to preserve contiguity information will remain an important area of focus.
- Genetics