Comprehensive, precision genomics
MetadataShow full item record
The past decade has observed a significant drop in the cost-per-base of DNA sequencing. Driven by a new era of `next-generation' sequencing (NGS), there has been an explosion of new technologies that utilize DNA sequencing, not just for primary sequence but a wide variety of biological assays. Despite the versatility of NGS, there are a number of drawbacks, including high sample input requirements and short read lengths. Because of the latter, the majority of genome studies cannot resolve haplotype or structural variation which requires long-range information and can play an important role in studying evolution, disease, and is crucial in the <italic>de novo</italic> assembly of genomes. In this dissertation I describe and apply methods to overcome these obstacles. First, I describe a method for the construction of DNA sequencing libraries that utilized a hyperactive transposase to fragment DNA and append universal sequencing primers in a single enzymatic step. This approach reduced the turnaround time from sample to sequencing-ready libraries, and significantly reduced the sample input requirements due to fewer enzymatic steps. I then describe a modified version of the method that allowed for a greater than 100 fold decrease in input requirements for the construction of libraries for the detection of DNA methylation. Next, I discuss a method that utilized the inherent properties of Tn5 transposase to provide long-range sequence information that served as the input for a novel <italic>de novo</italic> genome assembly algorithm. I applied this method to human, mouse, and fly assemblies to produce output scaffolds with contiguity improvements of up to 75 fold with high accuracy. Last, I describe the application of long-range sequence information to haplotype-resolve the genome and epigenome of the aneuploid HeLa cancer cell line. I investigated the global effects of copy number and haplotype on transcript abundance and epigenetic landscape and identified a number of outliers, including haplotype-specific expression of the proto-oncogene <italic>MYC</italic>. I reveal the mechanism responsible for this activation as the complex integration of the HPV-18 viral genome that includes an epithelial-specific enhancer at high copy number 500 kilobasepairs upstream of <italic>MYC</italic> locus.