ResearchWorks Archive
    • Login
    View Item 
    •   ResearchWorks Home
    • Dissertations and Theses
    • Statistics
    • View Item
    •   ResearchWorks Home
    • Dissertations and Theses
    • Statistics
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Generalization of boosting algorithms and applications of Bayesian inference for massive datasets

    Thumbnail
    View/Open
    9944172.pdf (6.267Mb)
    Date
    1999
    Author
    Ridgeway, Gregory Kirk, 1973-
    Metadata
    Show full item record
    Abstract
    In recent years statisticians, computational learning theorists, and engineers have developed more advance techniques to learn complex non-linear relationships from datasets. However, not only have models increased in complexity, but also datasets have outgrown many of the computational methods for fitting the models that are in standard statistical practice. The first several sections of this dissertation show how boosting, a technique originating in computational learning theory, has wide application for learning non-linear relationships even when the datasets are potentially massive. I describe particular applications of boosting for naive Bayes classification and regression, and exponential family and proportional hazards regression models. I also show how these methods may easily incorporate many desirable properties including robust regression, variance reduction methods, and interpretability. On both real and simulated datasets and in a variety of modeling frameworks, boosting consistently outperforms standard methods in terms of error on validation datasets.In separate but related work, the last chapter presents ideas for utilizing Bayesian methods for inference in massive datasets. Modern Bayesian analysis relies on Monte Carlo methods for sampling from complex posterior distributions, a Bayesian hierarchical model perhaps. These methods experience tremendous slowdown in computation when the posterior distribution conditions on a large dataset and the dataset cannot be summarized in terms of a small number of sufficient statistics. I develop an adaptive importance sampling algorithm that efficiently simulates draws from a posterior distribution conditioned on a massive dataset. Subsequently, I also propose a method for approximate Bayesian inference using likelihood clustering for data reduction.
    URI
    http://hdl.handle.net/1773/8986
    Collections
    • Statistics [81]

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    @mire NV
     

     

    Browse

    All of ResearchWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    @mire NV