An Evaluation of Flexible Summary Measures for the Comparison of Binary Outcomes in Non-Inferiority Trials
Thommes, Erika Elizabeth
MetadataShow full item record
In clinical trials, the comparison of binary outcomes between two independent treatment groups is most commonly measured by either relative or absolute differences between outcome rates. In the setting of a non-inferiority (NI) trial, an NI margin corresponding to one of these measures is defined to represent the maximum clinically meaningful limit by which an experimental intervention will be considered allowably inferior to a standard-of-care (SOC) treatment regimen. In the instance of extreme event rates, special consideration should be given to the intervention's allowable outcome rates as defined by the SOC rate and the NI margin in order to produce a meaningful assessment of the intervention's impact on public health. We propose one method by which the definition of inferiority can be relaxed in the case of extremely rare failure events. Using the failure rate among subjects receiving the SOC, we introduce a clinically meaningful threshold at which the comparison of treatment groups will switch from a conservative relative comparison to a potentially more meaningful and interpretable absolute comparison. This threshold is to be defined at a failure rate at which study investigators feel comfortable enough with the rarity of events among subjects receiving the SOC such that they are willing to increase the allowable failure rate among those receiving the intervention. This threshold is further defined in a manner that maintains a continuous margin by which NI can be judged for two binary outcomes throughout the parameter space. Focusing on asymptotic methods, we compare statistical inference based on the Wald, score and likelihood ratio (LR) statistics under this proposal with that of a standard relative comparison of outcome rates. We illustrate the potential advantages of our proposal based on the relaxed assumption of inferiority for extremely low failure rates. We establish the type 1 error and coverage probability of our proposed method against relative and absolute comparisons. Finally, we compare the commonalities and differences in the statistical behavior of these three asymptotic methods under a fixed trial design versus a group sequential sampling design. Using a one-sided significance level of 2.5\%, our results indicate that the type 1 error rate under our threshold proposal is similar to that of a relative comparison when the observed SOC is above the threshold and to that of an absolute comparison when the observed SOC failure rate is at or below the threshold. Similarly, coverage probability remains stable at approximately 95\% for the Wald, score and LR-based 95\% confidence intervals under both fixed and sequential sampling designs, with slightly more variability in the latter. The potential for achieving non-inferiority when there truly is no difference in failure rates between subjects in the two treatment groups increased by up to 40\% for a 7\% threshold and up to 30\% for a 5\% threshold under the investigated scenarios. Marginal gains in power of up to 15\% exist when detecting failure rates within the region of relaxed inferiority, but the most promising gains in power (up to 40\% in our simulations) are observed in the region deemed NI by both the relative and absolute NI margins. We conclude that our threshold proposal increases the probability of detecting meaningfully non-inferior interventions without significantly increasing type 1 error or decreasing coverage probability. Future work relating to this research might further investigate a trial design in which more than one threshold and corresponding rule for comparison of treatments can be invoked for a series of rare outcome rates. Or, one might alternatively consider the statistical behavior when switching from a relative to an absolute comparison in the case of extremely frequent outcomes. Overall, this work is a starting point by which the statistical comparison of two treatment arms can become more flexible in order to adhere most closely to a meaningful scientific comparison.
- Biostatistics