Statistical Methods to Estimate Evolutionary and Technical Parameters Using Whole Genome Sequence Data
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Whole genome sequence data are widely used in humans and other species to reveal evolutionary patterns and recent demographic history. In this dissertation, we introduce three new statistical methods that can be used to estimate technical parameters such as genotype error rates, as well as parameters related to genome evolution and recent demographic history, using whole genome sequence data from humans and SARS-CoV-2. In our first method, we propose a model that calculates the likelihood of observed parent-offspring trio genotypes, adjusting for both genotype errors and uncalled deletions. We fit our model to SNVs in 77 White British trios identified in the UK Biobank whole genome sequence data, obtaining estimates for the genotype error and uncalled deletion rates in this dataset. In our second method, we formulate a model to estimate the mean length of gene conversion tracts. Our model uses a separate per-site allele conversion rate for each observed tract. We fit this model to gene conversion tracts detected from the UK Biobank whole autosome sequence data and infer the mean length of gene conversion tracts in humans. Finally, in our third method, we propose a hidden Markov model that accounts for mutations and genotype errors to detect recombinant SARS-CoV-2 sequences.
Description
Thesis (Ph.D.)--University of Washington, 2025
