ResearchWorks Archive

Generalization of boosting algorithms and applications of Bayesian inference for massive datasets

Show simple item record Ridgeway, Gregory Kirk, 1973- en_US 2009-10-06T22:55:01Z 2009-10-06T22:55:01Z 1999 en_US
dc.identifier.other b43914652 en_US
dc.identifier.other 44004145 en_US
dc.identifier.other Thesis 48720 en_US
dc.description Thesis (Ph. D.)--University of Washington, 1999 en_US
dc.description.abstract In recent years statisticians, computational learning theorists, and engineers have developed more advance techniques to learn complex non-linear relationships from datasets. However, not only have models increased in complexity, but also datasets have outgrown many of the computational methods for fitting the models that are in standard statistical practice. The first several sections of this dissertation show how boosting, a technique originating in computational learning theory, has wide application for learning non-linear relationships even when the datasets are potentially massive. I describe particular applications of boosting for naive Bayes classification and regression, and exponential family and proportional hazards regression models. I also show how these methods may easily incorporate many desirable properties including robust regression, variance reduction methods, and interpretability. On both real and simulated datasets and in a variety of modeling frameworks, boosting consistently outperforms standard methods in terms of error on validation datasets.In separate but related work, the last chapter presents ideas for utilizing Bayesian methods for inference in massive datasets. Modern Bayesian analysis relies on Monte Carlo methods for sampling from complex posterior distributions, a Bayesian hierarchical model perhaps. These methods experience tremendous slowdown in computation when the posterior distribution conditions on a large dataset and the dataset cannot be summarized in terms of a small number of sufficient statistics. I develop an adaptive importance sampling algorithm that efficiently simulates draws from a posterior distribution conditioned on a massive dataset. Subsequently, I also propose a method for approximate Bayesian inference using likelihood clustering for data reduction. en_US
dc.format.extent vii, 176 p. en_US
dc.language.iso en_US en_US
dc.rights Copyright is held by the individual authors. en_US
dc.rights.uri For information on access and permissions, please see en_US
dc.subject.other Theses--Statistics en_US
dc.title Generalization of boosting algorithms and applications of Bayesian inference for massive datasets en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ResearchWorks

Advanced Search


My Account