Hosmer-Lemeshow goodness-of-fit test: Translations to the Cox Proportional Hazards Model

dc.contributor.advisorMay, Susanneen_US
dc.contributor.authorGuffey, Danielleen_US
dc.date.accessioned2013-04-17T18:04:45Z
dc.date.available2013-10-15T11:06:14Z
dc.date.issued2013-04-17
dc.date.submitted2012en_US
dc.descriptionThesis (Master's)--University of Washington, 2012en_US
dc.description.abstract<bold>Background:</bold> The goodness of fit of a statistical model is commonly assessed by describing how well the model fits the observed data. For logistic regression the Hosmer-Lemeshow goodness-of-fit test compares the number of expected events from the logistic regression model to the number of observed events within deciles of predicted probabilities. This research evaluates two translations of the Hosmer-Lemeshow goodness-of-fit test for logistic regression to the Cox proportional hazards model, the Cook-Ridker (CR) and the D'Agostino-Nam (DAN) tests. These translations are compared to a test which was designed specifically for survival data, the Grønnesby and Borgan (GB) test. The GB test uses martingale residuals to compare the count of events to the semi-parametric estimates from the Cox proportional hazards model on a cumulative hazards scale. In contrast, the CR and DAN translations compare the non-parametric Kaplan-Meier estimate and the semi-parametric Cox proportional hazards estimate of survival at a fixed time. <bold>Methods:</bold> The sizes of these tests are investigated by simulating survival data and varying the baseline hazard function (exponential, Weibull and log-logistic), effect size, percentage of censoring, sample size, number of groups, and the choice of fixed time point (for the CR and DAN tests). <bold>Results:</bold> The sizes of the CR and DAN tests are near the nominal level in very few of the simulated scenarios. For most scenarios the CR and DAN tests have a size that is either much larger or much lower than the nominal level. However, when using half the maximum simulated time as the fixed time point the sizes of the CR and DAN test are near or closer to the nominal level in more scenarios compared to when the maximum time point is used. In addition, numerical issues can occur when the estimated survival probability is zero and when the estimated expected number of events is either close to zero or close to one. These results also expand on previous simulation studies showing that the size of the Grønnesby and Borgan test is notably above 0.05 in larger sample sizes (1000 or more). <bold>Conclusions:</bold> Although the CR and DAN translations of the Hosmer-Lemeshow goodness-of-fit test to the Cox proportional hazards regression are conceptually intuitive they appear to have an incorrect size and numerical issues can occur. The Grønnesby and Borgan test should be used instead since it has a more appropriate size when used with the correct number of groups.en_US
dc.embargo.termsRestrict to UW for 6 months -- then make Open Accessen_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.otherGuffey_washington_0250O_11143.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1773/22648
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.subject.otherBiostatisticsen_US
dc.subject.otherbiostatisticsen_US
dc.titleHosmer-Lemeshow goodness-of-fit test: Translations to the Cox Proportional Hazards Modelen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Guffey_washington_0250O_11143.pdf
Size:
2.01 MB
Format:
Adobe Portable Document Format

Collections