R-squared inference under non-normal error
Abstract
Assessment of the relationship between diet and health status, especially association between diet and chronic disease risk, has attracted lot of research interest in statistical and epidemiologic studies. However, due to measurement errors in commonly utilized self-reported assessment approaches, an expected strong relationship was not identified in most studies. Developments in biomarker measures provide objective consumption assessment for specific dietary components which are utilized to develop calibrated dietary consumption function to remove bias embedded in those self-reported dietary measures. Researchers are interested in the explanatory strength of calibration equations and comparison of the strengths among various self-report measures. Thus, as a common metric used in these studies, reliable estimation of R-squared and of its confidence interval are important. Inference for R-squared, including confidence intervals for R-squared has not attracted much attention in the statistical literature. In this dissertation we proposed two methods to estimate confidence intervals for R-squared under errors from normal and non-normal distributions: the first method is based on asymptotic theories and entails the development of the asymptotic distribution of R-squared, and its relevant functions, when sample size becomes large; the second approach is based on a general F-test applied to linear regression but adjusts degree of freedom parameters in the F-test statistics using the empirical skewness and kurtosis of regression errors. In addition, when there are measurement errors in the independent variables, R-squared directly estimated from the regression can be biased and may, for example, underestimate the relationship between dependent and independent variables even with normally distributed errors. This dissertation also proposes a correction methodology to reduce the bias in R-squared estimation in the presence of classical additive measurement errors. The proposed methodologies have been evaluated in simulation and applied to nutritional biomarker studies in the Women's Health Initiative.
Collections
- Statistics [79]