Expanding the accuracy, resolution, and breadth of cell-free DNA investigation
Snyder, Matthew William
MetadataShow full item record
When cells die, they don’t simply vanish without a trace. Instead, they leave behind fingerprints of their genetic and epigenetic identities in the form of cell-free DNA (cfDNA), or the scant amount of highly fragmented DNA circulating in human plasma. As the detritus of apoptotic and necrotic cell death in multiple tissues throughout the body, this class of molecule serves as a powerful biomarker for noninvasive detection and monitoring of disease processes and physiological conditions, including pregnancy, organ transplantation, and a growing number of cancers. Despite this promise, current methods for interrogating cfDNA are challenged by limited resolution, imperfect accuracy, and constrained breadth. Taken as a whole, these factors restrict the set of conditions that might in principle be detected or monitored with this molecular evidence. In this dissertation, I directly address these limitations with the goal of expanding the scope and precision of the ``liquid biopsy,'' or the noninvasive monitoring of health status through cfDNA analysis. I first address the limited resolution of cfDNA testing in the context of pregnancy by developing statistical methods for inference of the entire fetal genome at the single nucleotide level, including both inherited and de novo variation. I show that the use of parental haplotypes and maternal cfDNA in a hidden Markov model can yield highly accurate prediction of inherited fetal genotypes. I next determine that the length of parental haplotype blocks is a key parameter driving the prediction accuracy, and demonstrate a method for increasing block length and downstream inference. I explain how these approaches, coupled with improved methods for detection of de novo variation, open the door to a single, noninvasive test with the possibility of prenatal detection of more than 3,000 highly penetrant single-gene disorders. I next demonstrate a method for improving the accuracy and positive predictive value (PPV) of noninvasive screening for fetal aneuploidy. In the most popular screening methodologies, PPVs of cfDNA-based tests are limited by a combination of the low incidence of trisomic pregnancies and the small number of false-positive tests in which truly euploid pregnancies are incorrectly classified as aneuploid. I investigate the causes of false-positive test results in a small cohort, and determine that maternal copy-number variants (CNVs) substantially contributes to the burden of spurious findings. I further develop a statistical framework for quantifying the likely impact of maternal CNVs by size and tested chromosome. I then propose a straightforward method for addressing this analytical limitation and improving test accuracy. Finally, I develop a new approach to disentangle the various tissues or cell types contributing to cfDNA in a biological sample, potentially expanding the breadth of physiological conditions that can be monitored in this way. Here, I show that the locations of cfDNA fragment endpoints evidence the positions of proteins on the DNA in vivo in the contributing cells, and use these endpoints to infer the spacing of nucleosomes and transcription factors genome-wide. I demonstrate that these positions correlate with gene expression profiles, and use these data to model cell type contributions in healthy individuals, where the expected myeloid and lymphoid cell lineages are recovered. I then apply this analytical framework to a cohort of individuals with advanced cancers, and recover the tissue-of-origin of the primary tumor for a subset of the cancers.
- Genetics