Handling missing values in risk prediction modeling: a comparative simulation study on parametric and machine learning multiple imputations
Loading...
Date
Authors
Wu, Yuxin
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Risk prediction is a critical tool in preventive medicine, enabling precision prevention for diseases. Electronic health record (EHR) data offers a rich source for constructing risk models, capturing detailed clinical information from patient cohorts. However, missing data poses a prevalent challenge in EHR analysis, and multiple imputation (MI) is a popular strategy for handling missing data. In this thesis, we employed simulations to compare different MI methods (parametric MI, MI using Random Forest, MI using Gradient Boosting Machines and MI using Principal Component Analysis) within the context of risk prediction modeling. Our investigation focused on evaluating predictive performance, encompassing measures of predictive accuracy and precision, for risk prediction models developed and assessed in datasets processed with various MI strategies. Furthermore, we explored two facets: (1) the impacts of including or omitting the outcome variable during MI, and (2) the impacts of model misspecification of higher-order effects during MI. We also used breast surveillance mammogram examination data from breast cancer survivors in the Breast Cancer Surveillance Consortium (BCSC) as the input for part of the bootstrapping and data illustration complementary to the simulation study. Our results revealed that the adoption of machine learning-based imputation methods did not lead to superior model performance compared to traditional parametric imputation. We recommend against including the outcome variable in the imputation model for the test set since it may raise concerns of over-optimistic predictive performance. Although it is not the focus of this thesis, we also recommend beingcautious of using Random Forest as the risk prediction model for similar prediction modeling settings.
Description
Thesis (Master's)--University of Washington, 2023
