Evaluation of Prediction Performance Metrics in the Rare Event Setting
Date
relationships.isAuthorOf
Minus, Emily
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Area under the receiving operator characteristic curve (AUC) is a commonly reported measure of discriminative performance for binary prediction models. However, there are concerns about AUC being a misleading measure of prediction performance in the rare event setting. This setting is commonly encountered with clinical prediction models, since many events of clinical importance, such as suicide, occur only rarely. We conducted a simulation study to investigate what drives inaccurate or unstable AUC performance in the rare event setting. Specifically, we aimed to determine whether a small number of events is the main driver of the poor AUC performance, or if the main driver is truly the event rate (i.e., there are many events, but they represent a small fraction of the total observations). We also investigated the behavior of other commonly used measures of prediction performance, such as PPV, accuracy, sensitivity, and specificity. Our results indicate that poor AUC performance---as measured by empirical bias, empirical MSE, variability of cross-validated AUC estimates, and empirical coverage of bootstrap intervals---is driven by the number of events, not event rate. While which measure of model performance is of greatest interest depends on how a model will be used, AUC is reliable in the rare event setting provided that the total number of events is moderately large.
Description
Thesis (Master's)--University of Washington, 2023
