Massively parallel functional dissection of regulatory elements
Abstract
Massively parallel sequencing has accelerated the cataloging of cis-regulatory elements in mammalian genomes. However, it remains challenging to estimate the functional effects of variation in cis-regulatory elements. The current methods to measure such effects are labor-intensive and involve testing each variant separately. This dissertation describes the development of methods to interrogate functional effects of cis-regulatory variants in a massively parallel fashion. First, I present a method that takes advantage of massively parallel DNA synthesis and massively parallel sequencing to test the functional effects of all possible single nucleotide variants of a given cis-regulatory element en masse in a single assay. As a proof of concept, this method was applied to perform saturation mutagenesis of three bacteriophage core promoters and three core promoters recognized by the mammalian Pol II transcriptional machinery. Microarray synthesized mutant promoters, each with a unique 20bp tag sequence downstream of the transcription start site were subjected to in vitro transcription and the resulting RNA-derived tags were sequenced. The relative abundance of each programmed tag provided a digital readout of the transcriptional efficiency of its cis-linked mutant promoter. Next, I describe a method to generate long, accurate reads from short, error-prone reads produced by the current massively parallel sequencing platforms. This strategy, referred to as "subassembly", is of broad utility in a wide range of contexts including but not limited to metagenomics, de novo genome assembly and detection of rare variants in clinical samples. It also enables the interrogation of longer regulatory elements beyond the current read-lengths supported by massively parallel sequencing platforms. Finally, I present an improved version of the saturation mutagenesis method, including incorporation of the "subassembly" technique and use it to dissect mammalian enhancers up to 620bp long in a massively parallel in vivo assay. Development of such methods for rapid functional analysis of regulatory elements will not only facilitate interpretation of variation and understanding of the architecture and grammar of these elements, but also enable design of novel synthetic regulatory elements.
Collections
- Genetics [146]