Comparing Internal Validation Methods for a Random Forest Prediction Model of Suicide Death
| dc.contributor.advisor | Coley, Yates | |
| dc.contributor.author | Liao, Qinqing | |
| dc.date.accessioned | 2020-10-26T20:39:50Z | |
| dc.date.available | 2020-10-26T20:39:50Z | |
| dc.date.issued | 2020-10-26 | |
| dc.date.submitted | 2020 | |
| dc.description | Thesis (Master's)--University of Washington, 2020 | |
| dc.description.abstract | Predictive models estimated with clinical data are increasingly popular in the medical data field. After developing a prediction model, its necessary to evaluate its performance in practice, or validate the model. Model validation methods include both internal and external validation; this thesis will focus on the comparison of internal validation methods using a split sample and an entire sample approach. The split sample approach uses a typical randomly selected validation set. For the entire sample approach, we explored three different methods – approximate optimism correction, exact optimism correction and 5-fold cross validation (CV). The dataset included 13,980,570 records on mental health outpatient visits between 2011 - 2017, including information on prior diagnoses, medications, and encounters prior to the visit and follow-up information on suicide death. Data were separated into a development dataset, which included visits from 2011 - 2014 and was used for model estimation and internal validation, and a prospective validation set, which included visits from 2015 - 2017 and was used to mimic the future data if the model were implemented in clinical practice. We estimated a random forest model to predict suicide death in the 90 days following a visit. We found that the split sample estimation method and 5-fold CV using the entire sample provided more accurate estimation of model performance compared to the exact and optimism correction methods using the entire sample, which both underestimated model optimism and, thus, overestimated model performance in the prospective dataset. Our results stand in contrast to prior research which demonstrated the accuracy of optimism correction methods with logistic regression models estimated using an entire sample approach. While findings may differ for other datasets, model estimation methods, and prediction applications, we recommend caution when using optimism correction methods for internal validation of prediction models estimated in the entire sample when working with very large datasets, rare events, and machine learning prediction models. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Liao_washington_0250O_22108.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/46384 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | Electronic Health Records | |
| dc.subject | Internal Validation | |
| dc.subject | Optimism Correction | |
| dc.subject | Predictive Model | |
| dc.subject | Random Forest | |
| dc.subject | Biostatistics | |
| dc.subject.other | Biostatistics | |
| dc.title | Comparing Internal Validation Methods for a Random Forest Prediction Model of Suicide Death | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Liao_washington_0250O_22108.pdf
- Size:
- 660.32 KB
- Format:
- Adobe Portable Document Format
