Parameter Selection of Sparse Functional Principal Component Analysis with fMRI data
Han, Joo Yoon
MetadataShow full item record
With the advent of high throughput biotechnologies, it is increasingly common for the number of features measured on each subject to vastly exceed the number of subjects in modern biomedical studies. In this manuscript we focus on these high dimensional issues for brain imaging data. Principal component analysis (PCA) is commonly used to reduce dimension of the data and to examine the major patterns. However, with such high dimensional data, PCA is inconsistent (Johnstone & Lu 2009). Moreover, when the underlying patterns are smooth and sparse, PCA will not be able to properly detect patterns. Sparse and smooth PCA may be of interest for high dimensional data, where the principal components are linear combinations of a subset of the features (with coefficient values that are spatially smooth). Specifically for fMRI data, where data are collected across time and regions of brain, smooth principal components can show major patterns in time. Also finding specific brain regions that are associated with the major patterns may be of interest. Allen (2013) introduced an optimization problem for this scenario, sparse and functional principal component analysis (SFPCA), which encourages both row and/or column factors are sparse and smooth. We apply SFPCA to brain imaging data Xn×p with n regions and p time points, where the row factors are sparse and the column factors are smooth. SFPCA problem involves three regularization parameters: sparsity parameter, smoothing parameter and number of components. The main goal of this thesis is to develop an automated method to select those regularization parameters involved in the SFPCA problem. The method is based on cross-validation; however, cross-validation with an unsupervised problem is not trivial. We leverage the time structure of brain imaging data in estimating held-out time-points in the test set. We also define the cross-validated proportion of variance explained for our problem and use it to select appropriate number of components (and regularization parameters for those components). We search for the regularization parameters sequentially, component- by-component. We compare performance of SFPCA (with our selected tuning parameters) to classical PCA with different signal to noise ratios (SNR). For sparse and smooth data, SFPCA sub- stantially outperforms PCA (classical PCA gives estimates that are not sparse, and much too non-smooth). As expected, when signal to noise ratio increases, SFPCA performance improves. In addition, as SNR increases, cross-validated proportion of variance explained more accurately estimates the true proportion of variance explained. From simulation studies, we find that we need enough signal to estimate factors using SFPCA properly. Moreover, we need reasonable candidate values of the regularization pa- rameters.
- Biostatistics