Dimension Reduction for Spatially-Misaligned and Multi-Pollutant Data with Missing Observations
Loading...
Date
Authors
Vu, Phuong Thu
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Accurate predictions of pollutant concentrations at new locations are often of interest in air pollution studies, in which data are usually not measured at all study locations. Ambient air is also a mixture of many chemical components, which can modify the associations between its total mass and various health outcomes. Principal component analysis (PCA) can be incorporated to obtain lower-dimensional representative scores of the multi-pollutant data. Spatial prediction models can then be used to estimate these scores at new locations. Recently developed predictive PCA (PredPCA) modifies the traditional algorithm to improve the overall predictive performance. However, these approaches require complete data, whereas multi-pollutant data tend to have complex missing patterns. In the first part of this dissertation, we propose a probabilistic version of PredPCA that can directly handle incomplete data with flexible model-based imputation accounting for geographic and spatial information. In the second part, we reformulate the PredPCA algorithm into a convex optimization problem by incorporating spatial information into the low-rank matrix completion framework. The advantages of our proposed method include simultaneous estimation of all components, orthogonality, and a mechanism to handle missing data. Finally, we leverage these core ideas to modify existing technique in low-rank tensor approximation to handle misaligned spatiotemporal data.
Description
Thesis (Ph.D.)--University of Washington, 2019
