Bayesian methods for variable selection

Porwal, Anupreet

Bayesian methods for variable selection

Files

Porwal_washington_0250E_26131.pdf (1.91 MB)

Date

2023-09-27, 2023-09-27, 2023-09-27

relationships.isAuthorOf

Porwal, Anupreet

Abstract

Choosing a statistical model and accounting for uncertainty about this choice are important parts of the scientific process and are required for common statistical tasks such as parameter estimation, interval estimation, statistical inference, point prediction and interval prediction. A canonical example is the variable selection problem in a linear regression model. Many ways of doing this have been proposed, including Bayesian and penalized regression methods. Each of these proposals has advantages and disadvantages, and the trade-offs are not always well understood. In this dissertation, we first compare 21 popular existing methods via an extensive simulation study based on a wide range of real datasets. We found that three adaptive Bayesian model averaging (BMA) methods performed best across all the statistical tasks. Subsequently, we also investigate the effect of model space priors on model inference under the BMA framework. For this study, we consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by the previous study. Additionally, we proposed a novel objective prior based on Power-expected-posterior priors for generalized linear models that relies on a Laplace expansion of the likelihood of the imaginary training sample. We investigate both asymptotic and finite-sample properties of the procedures, showing that they are both asymptotically and intrinsically consistent, and that their performance is superior to that of alternative approaches in the literature especially for heavy tailed versions of the priors. Finally, we propose a framework that unifies the two Bayesian paradigms of inducing sparsity namely (mixture of) $g$-priors and continuous shrinkage priors. The mixture of $g$-priors use a single shrinkage parameter across all predictors included in the model, incorporate correlation structure of covariates into the priors and allows for model selection, however suffers from Conditional Lindley Paradox (CLP). Continuous shrinkage priors like the Horseshoe prior, on the other hand, allow for a different shrinkage parameter for each coefficient however does not perform model selection. We propose global local-$g$ priors that borrows strength of the two paradigms and allows differential shrinkage across predictors while performing model selection. Additionally, we propose Dirichlet Process (DP) block-$g$ priors that allows combining $g$-priors and Bayesian non-parametric tools to incorporate correlation structure in the priors as well as adaptively identify and cluster predictors with varying degrees of relevance using a different shrinkage parameter for different clusters. We show empirically and theoretically that our proposed priors avoid CLP while still performing competitively or superior to existing methods in terms of model selection, parameter estimation and prediction.