Molecular tagging to overcome limitations of massively parallel sequencing
MetadataShow full item record
Massively parallel technologies have dramatically reduced the cost and increased the throughput of DNA sequencing, transforming the study of patterns of genetic variation in human and model organisms, the genetic basis of disease, and the organization, regulation, and function of genomes. However, the cost and throughput gains of the most widely used massively parallel sequencing platforms are offset by substantial drawbacks with respect to read length and base-calling accuracy, limiting their utility for many exciting and potentially important research and clinical applications. These include de novo assembly of genomes and metagenomes, diversity profiling of metagenomic communities and viral populations, synthetic functional assays to interrogate long genetic elements, and sensitive and highly multiplexed cancer-related gene sequencing. This dissertation describes a paradigm, "molecular tagging," which I developed to overcome the read length and error rate limitations of massively parallel sequencing, in the context of its use for three distinct applications. I first describe the development of a molecular tagging-based method called "Subassembly" and its application to de novo genome and metagenome assembly. Complexity-bottlenecked libraries of DNA fragments at least ~350 nucleotides in length were amplified, re-fragmented, and subjected to massively parallel sequencing such that we could identify groups of reads derived from the same longer progenitor fragment. Groups of reads were then locally assembled to generate highly accurate haploid consensus sequences that effectively corresponded to single input molecules, and these long consensus sequences were used to improve de novo genome assembly for single bacterial and metagenomic samples. Next, I describe the development and application of a massively parallel assay to dissect transcriptional enhancer elements with single nucleotide resolution. We applied this method to determine the functional consequences of all possible single nucleotide changes in three mammalian enhancers; this assay involved an optimized version of the Subassembly experimental protocol and analytical strategy. Finally, I describe the characterization and application of a multiplex assay for ultra-sensitive targeted sequencing of cancer-related genes in clinical tumor samples. I integrated the molecular tagging paradigm with the molecular inversion probe multiplex capture strategy to develop a method that is simultaneously simple, rapid, cost-effective, and ultra-sensitive for sub-clonal variation in genetically heterogeneous tissue samples. Molecular tagging is therefore a broadly applicable strategy to overcome key limitations of massively parallel sequencing that is expanding the utility of this already transformative group of technologies to a number of important research and clinical applications.
- Genetics