A unified approach to model-agnostic variable importance

dc.contributor.advisorCarone, Marco
dc.contributor.advisorSimon, Noah R
dc.contributor.authorWilliamson, Brian
dc.date.accessioned2020-02-04T19:24:34Z
dc.date.issued2020-02-04
dc.date.submitted2019
dc.descriptionThesis (Ph.D.)--University of Washington, 2019
dc.description.abstractAssessing the relative contribution of subsets of features towards predicting the response is often of interest in predictive modeling applications; this contribution is typically referred to as variable importance. Often, simple population models are used because the associated variable importance measure is easy to interpret; however, estimates may be misleading if the model used is overly simplistic. In an effort to improve prediction performance, complex prediction algorithms are often used instead; however, in these cases variable importance is often defined as a function of the algorithm rather than a summary of the population, rendering formal statistical inference on population importance difficult. In this dissertation, we propose a unified model-agnostic framework for statistical inference on population-level variable importance. Specifically, we define variable importance as a contrast between the predictiveness of the best possible prediction function based on all available features versus all features but those under consideration. We discuss general conditions under which a simple estimator of this importance is nonparametric efficient and allows the construction of valid confidence intervals. We also propose a valid strategy for hypothesis testing. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.
dc.embargo.lift2022-01-24T19:24:34Z
dc.embargo.termsRestrict to UW for 2 years -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWilliamson_washington_0250E_20880.pdf
dc.identifier.urihttp://hdl.handle.net/1773/45122
dc.language.isoen_US
dc.rightsCC BY-NC-ND
dc.subjectmachine learning
dc.subjectnonparametric statistics
dc.subjectvariable importance
dc.subjectBiostatistics
dc.subjectStatistics
dc.subject.otherBiostatistics
dc.titleA unified approach to model-agnostic variable importance
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Williamson_washington_0250E_20880.pdf
Size:
10.36 MB
Format:
Adobe Portable Document Format

Collections