Survey-Based Methodologies for Enhanced Assessment of Cause of Death

Fan, Shuxian

Survey-Based Methodologies for Enhanced Assessment of Cause of Death

Date

2025-08-01, 2025-08-01

relationships.isAuthorOf

Fan, Shuxian

Abstract

This dissertation explores several statistical challenges in cause-of-death (COD) assessment from verbal autopsy (VA) surveys—structured interviews with caregivers of the deceased in regions where traditional medical certification is unavailable. Despite their crucial role in mortality surveillance, VA data analysis is complicated by inconsistent age categorization, respondent burden from lengthy questionnaires, and potential biases in automated classification systems. The first project develops a Bayesian framework for reconciling inconsistent age categories across multiple VA data sources. We formulate age-disaggregated death counts as fully classified multinomial data and show that incorporating partially classified aggregated data can produce an improved Bayes estimator under the Kullback-Leibler (KL) loss. Under specific theoretical conditions, this approach calibrates data with different age structures to generate unified estimates of standardized age distributions. Through numerical studies and applications to real-world mortality data, we demonstrate the method's effectiveness in imputing incomplete classifications and guiding appropriate levels of age disaggregation. The second project adopts Bayesian active questionnaire design to optimize VA data collection processes. Using posterior-weighted KL information criteria and uncertainty-aware stopping rules, this approach sequentially selects questions to maximize information while minimizing respondent burden. Validation with gold-standard VA data shows comparable classification accuracy using substantially fewer questions, with implications for improved data collection efficiency. The third project presents a statistical framework for valid inference using predicted causes from VA narratives. By extending prediction-powered inference (PPI) to multinomial classification, we enable unbiased parameter estimation when using natural language processing models for COD classification. Cross-site validation demonstrates effective correction for transportability errors and highlights the distinction between predictive accuracy and inferential validity. The last project proposes and validates a proof-of-concept Bayesian mixture model for estimating cause-specific mortality with incomplete age stratification. Using age-mixing proportions within a Bayesian framework, this approach shows that incorporating partially observed age data improves estimation compared to discarding incomplete records. Analysis of demographic survey data from multiple countries reveals that the proposed approach generally yields more accurate cause-specific mortality estimates, with performance advantages varying by the actual age distribution of deaths. Together, these methodological innovations address fundamental challenges in survey-based mortality surveillance, with applications extending beyond COD assessment to broader problems of inference with incomplete or predicted data.