Performance of weakly-supervised electronic health record-based phenotyping methods in rare-outcome settings
Abstract
Background: Electronic Health Records (EHRs) enable large-scale biomedical research but key outcomes are often imperfectly captured, which is particularly important for rare outcomes in vaccine safety surveillance. Including features derived from natural language processing of chart notes in the EHR may improve computational phenotyping (prediction) performance. Performing manual chart review, the gold standard method for phenotyping, is expensive, limiting the amount of information for traditional supervised prediction algorithms. Methods: We evaluated three weakly-supervised phenotyping algorithms—PheNorm, MAP, and SureLDA—across simulated scenarios varying by disease prevalence (5% vs. 40%), label quality, and data complexity. Performance was measured using discrimination, precision, and calibration metrics across 2,500 replicates per scenario. We also applied probability- guided chart review to 1,028 potential anaphylaxis cases in a proof-of-concept study to see if reasonable cohorts for model development and evaluation could be obtained using thismethod. Results: Algorithm performance varied by context. Under optimal conditions, PheNorm and MAP achieved AUC > 0.99. In complex, low-prevalence settings, SureLDA variants outperformed others (AUC = 0.95 vs. 0.86 for PheNorm, 0.84 for MAP). Chart selection using predicted probabilities enriched for clinically meaningful cases compared to a random
sample stratified on two covariates. Conclusions: Algorithm choice should reflect deployment conditions. SureLDA is robust in complex settings; PheNorm performs well with reliable documentation. Hybrid approaches improve phenotyping accuracy and efficiency in EHR-based vaccine safety surveillance.
Description
Thesis (Master's)--University of Washington, 2025
