Multivariate Inference and Surveillance using Population Scale Data
Recent federal initiatives are incentivizing the routine collection and linkage of electronic health records (EHR) data across clinics, hospitals, and healthcare systems. Use of large-scale EHR data presents numerous opportunities for biomedical research but also unique challenges as EHR data are not collected for research purposes. We first consider data quality issues and the learning of differential patterns in healthcare utilization, through multivariate testing and estimation of subgroup differences in the endorsements of billing codes. We further consider the critical problem of pharmacosurveillance to monitor for rare adverse events once a drug or product is incorporated into routine clinical care. Key issues are the need to provide formal statistical inference for rare outcomes, and to offer flexible methods to control for many potential confounders. We provide an influence function based statistical framework that incorporates recent theoretical advances from econometrics to study conditions under which a three-step approach using regression adjustment of propensity score would provide valid and efficient estimation. The influence function representation also provides a variance estimator that fully accounts for uncertainty of both outcome modeling and propensity score estimation. We finally consider the potential correlation within healthcare providers and a simple correction in variance estimation for valid inference.
- Biostatistics