Evaluation of Prediction Performance Metrics in the Rare Event Setting

relationships.isAuthorOf

Minus, Emily

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Area under the receiving operator characteristic curve (AUC) is a commonly reported measure of discriminative performance for binary prediction models. However, there are concerns about AUC being a misleading measure of prediction performance in the rare event setting. This setting is commonly encountered with clinical prediction models, since many events of clinical importance, such as suicide, occur only rarely. We conducted a simulation study to investigate what drives inaccurate or unstable AUC performance in the rare event setting. Specifically, we aimed to determine whether a small number of events is the main driver of the poor AUC performance, or if the main driver is truly the event rate (i.e., there are many events, but they represent a small fraction of the total observations). We also investigated the behavior of other commonly used measures of prediction performance, such as PPV, accuracy, sensitivity, and specificity. Our results indicate that poor AUC performance---as measured by empirical bias, empirical MSE, variability of cross-validated AUC estimates, and empirical coverage of bootstrap intervals---is driven by the number of events, not event rate. While which measure of model performance is of greatest interest depends on how a model will be used, AUC is reliable in the rare event setting provided that the total number of events is moderately large.

Description

Thesis (Master's)--University of Washington, 2023

Citation

DOI

Collections