Methods for Agnostic Statistical Inference

Rice, KennethLi, Qijun2021-08-262021-08-262021-08-262021Li_washington_0250E_22855.pdfhttp://hdl.handle.net/1773/47360Thesis (Ph.D.)--University of Washington, 2021A traditional goal of parametric statistics is to estimate some or all of a data-generating model's finite set of parameters, thereby turning data into scientific insights. Point estimates of parameters, and corresponding standard error estimates are used to quantify the information provided by the data. However, in reality the true data-generation rarely follows the assumed model. When the model assumptions are incorrect, though the point estimates' target parameters are often still meaningful quantities, the model-based standard error estimates may be difficult to interpret in any helpful way; they may also be notably biased. In light of doubts about model assumptions and the known difficulties of checking models, agnostic statistics aims to develop statistical inference with only minimal assumptions. Though agnostic statistics has become popular over the past few decades, methods of agnostic inference are yet to be developed in some fundamental application areas. One such area is meta-analysis. Meta-analysis of 2-by-2 tables is common and useful in research topics including analysis of adverse events and survey research data. Fixed-effects inference typically centers on measures of association such as the Cochran-Mantel-Haenszel statistic or Woolf's estimator, but to obtain well-calibrated inference when studies are small most methods rely on assuming exact homogeneity across studies, which is often unrealistic. By showing that estimators of several widely-used methods have meaningful estimands even in the presence of heterogeneity, we derive improved confidence intervals for them under heterogeneity. These improvements over current methods are illustrated by simulation. We find that our confidence intervals provide coverage closer to the nominal level when heterogeneity is present, in both small and large-sample settings. The conventional confidence intervals derived under homogeneity are often conservative, though anti-conservative inferences occur in some scenarios. We also apply the proposed methods to a meta-analysis of 19 randomized clinical trials on the effect of sclerotherapy in preventing first bleeding for patients with cirrhosis and esophagogastric varices. Our methods provide a more interpretable approach to meta-analyzing binary data and more accuracy in characterizing the uncertainty of the estimates. Another area lacking agnostic methods is adaptive shrinkage estimation. Shrinkage estimation attempts to increase the precision of an estimator in exchange for introducing a modest bias. Standard shrinkage estimators in linear models include the James-Stein estimator, Ridge estimator, and LASSO. However, theories regarding the optimal amount of shrinkage and statistical properties of these estimators are often based on stringent distributional assumptions. In Chapter 3, we provide a unified framework of shrinkage estimation -- penalized precision-weighted least square estimation. We demonstrate that the James-Stein estimator, Ridge and LASSO are all penalized precision-weighted least-square estimators using model-based precision weights. Using a model-agnostic precision weighting matrix, we propose three shrinkage estimators in the novel framework: Rotated James-Stein estimator, Rotated Ridge, and Rotated LASSO. As we show, the three proposed estimators have theoretical properties and empirical performance that are comparable to the standard shrinkage estimators, while rotated LASSO has improved precision in some situations. We apply these estimators in a prostate cancer example. The third area is variance estimation in Bayesian inference. Many frequentist parametric statistical methods have large sample Bayesian analogs. However, there is no general Bayesian analog of ``robust'' covariance estimates, that are widely-used in frequentist work. In Chapter 4, we propose such an analog, produced as the Bayes rule under a form of balanced loss function. This loss combines standard parametric inference's goal of accurate estimation of the truth with less-standard fidelity of the data to the model. Besides being the large-sample equivalent of its frequentist counterpart, we show by simulation that the Bayesian robust standard error can also be used to construct Wald confidence intervals that improve small-sample coverage. We demonstrate the novel standard error estimates in a Bayesian linear regression model to study the association between systolic blood pressure and age in 2017-2018 NHANES data.application/pdfen-USCC BYDecision theoryMeta-analysisRegression modelingRobust statisticsPublic healthBiostatisticsBiostatisticsMethods for Agnostic Statistical InferenceThesis