Flexible strategies for association analysis with genomic phenotypes
Morrison, Jean Victoria
MetadataShow full item record
Advances in high throughput sequencing have lead to a proliferation of genomic assays that exploit sequencing to measure epigenetic traits with very high resolution, sometimes at every base-pair. Studying the relationship between these molecular traits or genomic phenotypes and cell or organismal level traits can lead to better understanding of genetic regulation and the biological processes underlying variation. The most basic approach to this task is to search for associations between genomic phenotypes and other traits such as experimental condition or disease status. This undertaking can be challenging. Functional genomic elements, such as promoters, exons, and transcription factor binding sites, are the unit of interest but annotations of the boundaries of these elements are far from complete. We therefore face the two-pronged problem of finding associations and identifying the boundaries of the underlying signal. We make two novel proposals that accomplish these tasks simultaneously, resulting in data adaptive region boundaries. In our first proposal, joint adaptive differential estimation (JADE), we approach the problem through estimation of the mean genomic phenotype, or profile, at each trait level. We use penalized regression to impose structure on these estimates and recover regions of association. JADE is powerful and provides a useful descriptive summary of the results by clustering profiles within associated regions. In our second proposal, flexible robust excursion test (FRET), we employ results for scanning statistics to construct a method that searches the genome for areas with non-zero regression coefficients. While less powerful than JADE in some circumstances, FRET is more computationally efficient, more robust to outliers, and provides control of the region-wise false discovery rate. We compare FRET and several alternative strategies applied to the problem of identifying genomic regions in which chromatin accessibility differs between drug resistant and susceptible cancer cell lines. Our results suggest that FRET is more powerful than the alternatives and that methods that rely on assumptions about the distribution of the data are poorly calibrated for this problem.
- Biostatistics