Bayesian Nonparametric Inference of Effective Population Size Trajectories from Genomic Data
Palacios Roman, Julia Adela
MetadataShow full item record
Phylodynamics is an area at the intersection of phylogenetics and population genetics that aims to reconstruct population size trajectories from genetic data. Phylodynamic methods rely on a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. The shape of a genealogy is influenced by the effective population size trajectory and, under the coalescent framework, the times at which genealogical lineages coalesce contain information about population size dynamics. I show that these coalescent times can be viewed as realization of a point process and that estimation of population size trajectories is equivalent to estimating a conditional intensity of the coalescent point process. This thesis presents a Gaussian process-based Bayesian nonparametric approach to estimate effective population size trajectories. First, I summarize and discuss current approaches to statistical inference in phylodynamics. Next, I demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics when the genealogy is assumed fixed. I compare our Gaussian process (GP) approach to one of the state of the art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Next, I show that when a representative genealogy is available, perhaps estimated using one of the phylogenetic reconstruction methods, we can replace Markov chain Monte Carlo (MCMC) methods to perform inference by integrated nested Laplace approximation (INLA). This approximation, actively used in spatial statistics, results in recovery of population size trajectories that is much faster than current MCMC based methods. However, the INLA algorithm cannot be generalized to a more realistic setting, where one starts with molecular data instead of a genealogy. Therefore, I return to MCMC to extend the GP approach to infer population size trajectories from molecular data directly. I test the GP-based method on simulated and real data. For real data, I estimate effective number of infected individuals with Hepatits C virus in Egypt from 1700 to 1993, the effective number of individuals infected with human influenza A virus in New York between 2000 and 2005 and effective number of Bisons across Beringia from present time to 100,000 years ago.
- Statistics