Dimension Reduction for Spatially-Misaligned and Multi-Pollutant Data with Missing Observations

Loading...
Thumbnail Image

Authors

Vu, Phuong Thu

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Accurate predictions of pollutant concentrations at new locations are often of interest in air pollution studies, in which data are usually not measured at all study locations. Ambient air is also a mixture of many chemical components, which can modify the associations between its total mass and various health outcomes. Principal component analysis (PCA) can be incorporated to obtain lower-dimensional representative scores of the multi-pollutant data. Spatial prediction models can then be used to estimate these scores at new locations. Recently developed predictive PCA (PredPCA) modifies the traditional algorithm to improve the overall predictive performance. However, these approaches require complete data, whereas multi-pollutant data tend to have complex missing patterns. In the first part of this dissertation, we propose a probabilistic version of PredPCA that can directly handle incomplete data with flexible model-based imputation accounting for geographic and spatial information. In the second part, we reformulate the PredPCA algorithm into a convex optimization problem by incorporating spatial information into the low-rank matrix completion framework. The advantages of our proposed method include simultaneous estimation of all components, orthogonality, and a mechanism to handle missing data. Finally, we leverage these core ideas to modify existing technique in low-rank tensor approximation to handle misaligned spatiotemporal data.

Description

Thesis (Ph.D.)--University of Washington, 2019

Citation

DOI

Collections