Carone, MarcoSimon, Noah RWilliamson, Brian2020-02-042020-02-042019Williamson_washington_0250E_20880.pdfhttp://hdl.handle.net/1773/45122Thesis (Ph.D.)--University of Washington, 2019Assessing the relative contribution of subsets of features towards predicting the response is often of interest in predictive modeling applications; this contribution is typically referred to as variable importance. Often, simple population models are used because the associated variable importance measure is easy to interpret; however, estimates may be misleading if the model used is overly simplistic. In an effort to improve prediction performance, complex prediction algorithms are often used instead; however, in these cases variable importance is often defined as a function of the algorithm rather than a summary of the population, rendering formal statistical inference on population importance difficult. In this dissertation, we propose a unified model-agnostic framework for statistical inference on population-level variable importance. Specifically, we define variable importance as a contrast between the predictiveness of the best possible prediction function based on all available features versus all features but those under consideration. We discuss general conditions under which a simple estimator of this importance is nonparametric efficient and allows the construction of valid confidence intervals. We also propose a valid strategy for hypothesis testing. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.application/pdfen-USCC BY-NC-NDmachine learningnonparametric statisticsvariable importanceBiostatisticsStatisticsBiostatisticsA unified approach to model-agnostic variable importanceThesis