Methods for estimation and inference for high-dimensional models
This thesis tackles three different problems in high-dimensional statistics. The first two parts of the thesis focus on estimation of sparse high-dimensional undirected graphical models under non-standard conditions, specifically, non-Gaussianity and missingness, when observations are continuous. To address estimation under non-Gaussianity, we propose a general framework involving augmenting the score matching losses introduced in Hyva ̈rinen [2005, 2007] with an l1-regularizing penalty. This method, which we refer to as regularized score matching, allows for computationally efficient treatment of Gaussian and non-Gaussian continuous exponential family models because the considered loss becomes a penalized quadratic and thus yields piecewise linear solution paths. Under suitable irrepresentability conditions and distributional assumptions, we show that regularized score matching generates consistent graph estimates in sparse high-dimensional settings. Through numerical experiments and an application to RNAseq data, we confirm that regularized score matching achieves state-of- the-art performance in the Gaussian case and provides a valuable tool for computationally efficient estimation in non-Gaussian graphical models. To address estimation of sparse high-dimensional undirected graphical models with missing observations, we propose adapting the regularized score matching framework by substituting in surrogates of relevant statistics to accommodate these circumstances, as in Loh and Wainwright  and Kolar and Xing . For Gaussian and non-Gaussian continuous exponential family models, the use of these surrogates may result in a loss of semi-definiteness, and thus nonconvexity, in the objective. Nevertheless, under suitable distributional assumptions, the global optimum is close to the truth in matrix l1 norm with high probability in sparse high-dimensional settings. Furthermore, under the same set of assumptions, we show that the composite gradient descent algorithm we propose for minimizing the modified objective converges at a geometric rate to a solution close to the global optimum with high probability. The last part of the thesis moves away from undirected graphical models, and is instead concerned with inference in high-dimensional regression models. Specifically, we investigate how to construct asymptotically valid confidence intervals and p-values for the fixed effects in a high-dimensional linear mixed effect model. The framework we propose, largely founded on a recent work [Bu ̈hlmann, 2013], entails de-biasing a ‘naive’ ridge estimator. We show via numerical experiments that the method controls for Type I error in hypothesis testing and generates confidence intervals that achieve target coverage, outperforming competitors that assume observations are homogeneous when observations are, in fact, correlated within group.
- Statistics