ResearchWorks Archive
    • Login
    View Item 
    •   ResearchWorks Home
    • Dissertations and Theses
    • Biology
    • View Item
    •   ResearchWorks Home
    • Dissertations and Theses
    • Biology
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Reliable and interpretable inference of evolutionary history using Bayesian phylogenetic approaches

    Thumbnail
    View/Open
    Magee_washington_0250E_22910.pdf (4.337Mb)
    Author
    Magee, Andrew Fergus
    Metadata
    Show full item record
    Abstract
    Phylogenetic trees are key objects for understanding evolutionary history, first used to describe relationships between groups of species. Phylogenies help us to fill out the tree of life and to describe the dynamics that have given rise to the diversity of life on Earth. As we have not witnessed the entire history of any group, phylogenies must be inferred from character data (often DNA sequence data) using statistical models. If we can specifically infer trees with a time component, such that we can measure the lengths of branches in real time, we can attempt to make inferences about the processes that gave rise to the phylogeny itself. In the case of species histories (a macroevolutionary process), we use birth-death models. Birth-death models, and time-calibrated phylogenies in general, are also useful in describing the course of infectious disease outbreaks, an application area known as infectious disease phylodynamics. In this thesis, I (and co-authors) develop new birth-death models applicable to both macroevolutionary and phylodynamic applications. First, we describe a parameter-rich time-varying birth-death model, which allows for birth, death, sampling, and death-upon-sampling. In macroevolutionary applications, birth is speciation, death is extinction, and sampling is fossilization (plus later recovery of the fossil). Death-upon-sampling is primarily useful in phylodynamic applications, where it models treatment or isolation after a diagnosis, and where birth is infection, death is recovery (absent treatment), and sampling is sequencing of the infectious disease agent (such as a virus). Our model includes all these processes for individual lineages, plus the possibility that there are instantaneous events applicable to all lineages. It is the first model to include these all-lineage-event versions of all four processes. Using Bayesian inference, we demonstrate the usefulness of this model in application to a previously inferred phylogeny of Crocodylomorpha (crocodiles and their relatives). We investigate the impact of the K-Pg (end Cretaceous) mass extinction and find that there is a very strong, and very robust, imprint of the K-Pg mass extinction in the phylogeny of Crocodylomorpha. Next, we describe time-varying priors applicable to rates of birth, death, and sampling through time. Specifically, we investigate performance of the horseshoe Markov random field as a birth-death model prior, and contrast its performance with a Gaussian Markov random field. In simulations, the horseshoe model performs quite well and appears to be capable of balancing both the power to detect rate variation with the ability to distinguish true rate variation from noise in the birth-death process. In full Bayesian analyses of real datasets (inferring the tree and birth-death model from sequence data), we detect a clear signature of a speciation-rate decrease in a group of Australian geckos and estimate that the HIV epidemic among Russian and Ukrainian drug users peaked between roughly 1993 and 2000. Lastly, we turn our attention back to the matter of inferring phylogenies. As phylogenetic posterior distributions are difficult to work with, we must instead approximate them using samples from Markov chain Monte Carlo. In this chapter, we ask if it is possible to quantify the variability (also called Monte Carlo error) inherent in this procedure. Using a novel simulation approach, we find that the Monte Carlo error in important quantities (such as the summary tree) can in fact be reliably quantified. Application to benchmark datasets shows the danger inherent in the currently common approaches of either ignoring the sampling variability in the tree or using proxies.
    URI
    http://hdl.handle.net/1773/47355
    Collections
    • Biology [192]

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us
    Theme by 
    @mire NV
     

     

    Browse

    All of ResearchWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us
    Theme by 
    @mire NV