Comparison of Several Statistical Tests for Evaluating Novel Treatments in the Out-of-Hospital Cardiac Arrest Setting

Li, Jiahe

Comparison of Several Statistical Tests for Evaluating Novel Treatments in the Out-of-Hospital Cardiac Arrest Setting

Files

Li_washington_0250O_21482.pdf (1.48 MB)

Date

2020-08-14

relationships.isAuthorOf

Li, Jiahe

Abstract

In the out-of-hospital cardiac arrest (OHCA) setting, it is generally agreed that a clinically meaningful endpoint, such as survival with neurological and physiological status similar to pre arrest, is a good outcome. For logistical reasons (especially sample size), an intermediate endpoint, for example, return of spontaneous circulation (ROSC) or survival to hospital admittance, is commonly used as a surrogate under the assumption that the survival rate conditional on achieving the intermediate outcome does not depend upon the treatment. However, trials in this field have demonstrated that the assumption does not always validate and the advantage of reducing sample size is no longer applicable. Hence, focusing solely on an intermediate univariate endpoint is an inadequate methodology for evaluation of improvements in the OHCA setting. Thus, it is necessary to evaluate alternative statistical tests that can borrow and combine information from both the clinically meaningful endpoint and the intermediate endpoint. In this study, I investigate the statistical performances between two standard univariate tests where intermediate endpoint and survival are used individually, a bivariate test where intermediate endpoint and conditional survival are considered jointly, and a combined test where survival is tested with limited loss of power compared to the univariate test based only on the intermediate endpoint for the purpose of testing a novel intervention versus standard of care. I generate equations as good approximations for the critical values of the combined test and simulate the required sample sizes for the bivariate and the combined tests. The four tests are compared in terms of test size under a moderately small sample size, and power (and false positive rates) under different typical scenarios when independence between conditional survival and intermediate outcome is assumed. I also evaluate the test size performances when the independence assumption fails for the feasibility of the tests in real world. Finally, I summarize the commonalities and differences in the statistical behavior of these four methods as well as illustrating their advantages and flaws. Our results do not indicate explicitly that any of the tests outperforms all the other tests all the time across a typical range of control rates. For the Type I error rate, all the tests manage to generate a close test size around the given ????-level unless the sample size and control rates are too small for the Central Limit Theorem (CLT) approximation to apply. As expected, the required sample sizes for the bivariate and composite tests are much less than that for survival alone to obtain a certain power with pre-defined effect size and control rates, but more patients are required for the composite test. For our goal of detecting a better treatment versus standard of care, the combined test works better due to its protection when survival is worsened while the bivariate test fails to tell the effectiveness of a treatment on survival as long as there is any improvement on the intermediate outcome. Under a reasonable dependency assumption, the figures suggest that the tests are applicable in the real world. Future work relating to this study might extend the work from one-sided to two-sided alternatives, so both positive and negative effects can be evaluated in the comparisons of new treatments and standard of care. One can also investigate and explore the statistical performances under other possible dependency assumptions for a better simulation of the real cases. Overall, this work is a starting point to display the statistical comparison of several methods and there is still space for further scientific investigation. Hopefully, this work can provide more informed and meaningful decision making to OHCA strategies.