Waterston, Robert HNoble, William SDurham, Timothy2019-08-142019-08-142019Durham_washington_0250E_19949.pdfhttp://hdl.handle.net/1773/44274Thesis (Ph.D.)--University of Washington, 2019One of the principal questions in biology is how the genome encodes the information required for producing a multicellular organism. Somehow, the structure of the genome maps to every function in the living organism, from how to assemble the body plan during development to how to react to environmental stimuli or stressors. We know that much of this encoded information is decoded in the cell by gene regulatory networks; sets of transcription factor genes and their repertoire of binding sites throughout the genome that allow them to turn sets of genes on and off. If we could comprehensively map these gene regulatory networks and the settings in which they are active, we would have a deep, mechanistic understanding of how and why cells behave the way they do and how and why mutations in the genome affect phenotype. However, we are still in the very early days of this effort. In order to infer the gene regulatory networks, we first need to understand the ``parts list'' consisting of every gene and regulatory site, as well as precisely where and when those genes and regulatory sites are active. I present two projects that move the field closer to attaining this comprehensive census of gene expression and regulatory site activity. One is a project to apply single-cell Assay for Transposase-Accessible Chromatin followed by sequencing (scATAC-seq) to generate the first cell type-specific map of chromatin accessibility in Caenorhabditis elegans, a promising model organism for comprehensive regulatory network inference. The other is a machine learning framework for jointly modeling at once thousands of genome-wide experiments from large epigenomics data collections; the model can be used to summarize information from the collection and to impute (i.e. infer computationally) the results of missing experiments. To conclude, I describe the next challenge of mapping the connections among genes and regulatory sites, and one way that emerging single-cell and genome editing technologies might be used to begin attacking this problem at scale.application/pdfen-USCC BY-NC-SACaenorhabditis eleganschromatin accessibilitygenomicsimputationsingle celltensor factorizationGeneticsBioinformaticsSystematic biologyGeneticsToward comprehensive characterization of chromatin stateThesis