Analysis of biased sampling designs using longitudinal data
Zelnick, Leila Ruth
MetadataShow full item record
With increasing availability of prospective cohort studies, registry data, and electronic health records, numerous secondary investigations are being conducted using data that were originally collected for a different primary research goal. When explorations of novel biomarkers and their associations with outcomes are of interest, it is natural to leverage existing cohorts for which stored biological specimens may be available or for which new specimens can be selectively collected and processed, yielding new exposure data. However, limited availability of specimens and limited financial resources may require investigators to target only a subset of patients for any new analyses. In such cases, the use of outcome dependent sampling (ODS) designs can provide an efficient and cost-effective way to conduct substudies leveraging existing outcomes. In ODS designs a subsample is chosen based on characteristics of the outcome variable, and for these select subjects the detailed covariate data is then collected. Design and analysis methods that use longitudinal outcomes to guide choice of a subsample have been shown to improve efficiency over random sampling (Schildcrout et al. (2013)), but to date statistical methods have focused exclusively on using only the data from the subsample for final analysis. However, ODS research in the univariate setting (Lawless et al. (1999), Weaver and Zhou (2005), Chatterjee et al. (2003)) has shown that analyzing the incomplete data from unsubsampled individuals, in addition to those on whom the biomarker has been ascertained, may contribute to improved estimation of target regression parameters. Once an ODS sample is obtained a variety of analysis approaches can be used to provide valid inference, but the complexity and utility of alternative analysis approaches has not been thoroughly investigated for designs with longitudinal outcome data. This dissertation focuses on the use of ODS sampling designs and analysis in the longitudinal setting with continuous outcomes. We examine the potential efficiency gains of this family of designs/analyses from a likelihood perspective and offer robust alternatives that may be preferred under possible model misspecification. Finally, we adapt a standardization technique from the epidemiological/causal inference literature that may provide benefit in analyzing ODS of longitudinal processes and can be used to accommodate unanticipated missingness caused by participants dropping out of a study prior to its completion. In each case, the methods explored here are illustrated for a hypothetical biomarker substudy, using data from the Cystic Fibrosis Foundation Patient Registry.
- Biostatistics