Regulatory variation and human disease
Abstract
Non-coding regulatory regions are strongly implicated in human disease via genetic studies. However, it is currently not possible to interpret reliably and systematically the functional consequences of genetic variation within any given transcription factor recognition sequence. To lay the groundwork for the assessment of regulatory variation in human disease, I comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a three-generation pedigree as well as 19 diverse human cell types. We identified hundreds of genetic variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein-DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. Examining variation across multiple cell types, we observed highly reproducible yet surprisingly plastic genomic binding landscapes, indicative of strong cell-selective regulation of CTCF occupancy. Comparison with massively parallel bisulfite sequencing data indicates that 41% of variable CTCF binding is linked to differential DNA methylation, concentrated at two critical positions within the CTCF recognition sequence. These results establish the feasibility of studying the regulatory architecture of human disease. I then apply the framework developed in the CTCF model system to the interpretation of genome-wide association studies (GWAS), which have identified many non-coding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by DNase I hypersensitive sites (DHSs). 88% of such DHSs are active during fetal development, and are enriched for gestational exposure-related phenotypes. We identify distant gene targets for hundreds of DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrate tissue-selective enrichment of more weakly disease-associated variants within DHSs, and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. This dissertation establishes a framework for the study of regulatory variation, suggests pervasive involvement of regulatory DNA variation in common human disease, and provides pathogenic insights into diverse disorders.
Collections
- Genetics [146]