Searching for Predictive Subgroups
Prince, David Karl
MetadataShow full item record
Our increased understanding of genomics and related fields has led to: 1) the identification of subtypes of cancer based on various biomarkers and 2) the development of drugs to target specific subtypes. We then use clinical trials to test whether these novel treatments are effective for a population in question. However, deciding which patients to enroll in a confirmatory trial is typically based on limited empirical evidence. In this dissertation we propose two methods using phase 2 trial results to identify the group that benefits from a treatment, where this group could be all, some, or none of the enrolled population. This contrasts with the traditional approach where the primary analysis of a trial focuses on deciding between all or none. The primary novel aspect of our methods is finer control in the definition of a predictive subgroup. Our first method is an application of logic regression, a tree-based classification algorithm. The method identifies a subgroup of patients that have differential treatment benefit by finding a Boolean statement of binary baseline covariates most strongly associated with an outcome. The second method we call SHAPES and it restricts candidate subgroups to connected and convex or co-convex collections of points in the Boolean space. For both methods we develop methods for continuous and binary outcomes, decision rules and utility functions. We develop and report several metrics to measure performance in this setting including correct group identification rates, the power to detect a subgroup at or above a specified effect, and any rejection rates. In simulation studies, we evaluate our methods using these metrics under various scenarios, including a range of subgroup and full population effect sizes, the prevalence of subgroups, the presence of effects in the absence of treatment (prognostic effects), the number of covariates considered, sample sizes and tuning parameter values. In the presence of large subgroup effect sizes, simple subgroup definitions, few covariates (2-4), moderate sample sizes (200) and correct tuning, our method identifies the true subgroup well. With complex subgroup definitions or incorrect tuning, performance deteriorates quickly relative to simpler definitions. We apply the methods to a clinical trial in acute myeloid leukemia and explore the relationship between identified subgroup and tuning parameter settings.
- Biostatistics