Show simple item record

dc.contributor.authorRidgeway, Gregory Kirk, 1973-en_US
dc.date.accessioned2009-10-06T22:55:01Z
dc.date.available2009-10-06T22:55:01Z
dc.date.issued1999en_US
dc.identifier.otherb43914652en_US
dc.identifier.other44004145en_US
dc.identifier.otherThesis 48720en_US
dc.identifier.urihttp://hdl.handle.net/1773/8986
dc.descriptionThesis (Ph. D.)--University of Washington, 1999en_US
dc.description.abstractIn recent years statisticians, computational learning theorists, and engineers have developed more advance techniques to learn complex non-linear relationships from datasets. However, not only have models increased in complexity, but also datasets have outgrown many of the computational methods for fitting the models that are in standard statistical practice. The first several sections of this dissertation show how boosting, a technique originating in computational learning theory, has wide application for learning non-linear relationships even when the datasets are potentially massive. I describe particular applications of boosting for naive Bayes classification and regression, and exponential family and proportional hazards regression models. I also show how these methods may easily incorporate many desirable properties including robust regression, variance reduction methods, and interpretability. On both real and simulated datasets and in a variety of modeling frameworks, boosting consistently outperforms standard methods in terms of error on validation datasets.In separate but related work, the last chapter presents ideas for utilizing Bayesian methods for inference in massive datasets. Modern Bayesian analysis relies on Monte Carlo methods for sampling from complex posterior distributions, a Bayesian hierarchical model perhaps. These methods experience tremendous slowdown in computation when the posterior distribution conditions on a large dataset and the dataset cannot be summarized in terms of a small number of sufficient statistics. I develop an adaptive importance sampling algorithm that efficiently simulates draws from a posterior distribution conditioned on a massive dataset. Subsequently, I also propose a method for approximate Bayesian inference using likelihood clustering for data reduction.en_US
dc.format.extentvii, 176 p.en_US
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.rights.uriFor information on access and permissions, please see http://digital.lib.washington.edu/rw-faq/rights.htmlen_US
dc.subject.otherTheses--Statisticsen_US
dc.titleGeneralization of boosting algorithms and applications of Bayesian inference for massive datasetsen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record