Developing a Clustering-Based Empirical Bayes Analysis Method for Hotspot Identification
MetadataShow full item record
Hotspot identification (HSID) is a critical part of network-wide safety evaluation. Put simply, HSID involves ranking sites (e.g., roadway segments or intersections) on the basis of observed and/or estimated safety so they may be prioritized for treatment. Typical methods for ranking sites are often rooted in use of the Empirical Bayes (EB) method to estimate safety from both observed crash history and crash frequency predictions based on similar sites. Such procedures are an improvement over naïve methods that consider only observed crash frequencies/rates as they can account for regression-to-the-mean bias and are less subject to random variation in the crash data. That said, the performance of the EB method is highly related to the selection of a reference group of sites similar to the target site from which the safety performance function (SPF) used to predict crash frequency in the EB method will be developed. As crash data often contain underlying heterogeneity that, in essence, can make them appear to be generated from distinct subpopulations, methods are needed to select similar sites in a principled manner. To overcome this possible heterogeneity problem, EB-based HSID methods that use common clustering methodologies (e.g., mixture models, K-means, and hierarchical clustering) to select “similar” sites for building SPFs were developed. The performances of the clustering-based EB methods were then compared by using real crash data. Here, HSID results, when computed with Texas undivided rural highway cash data, suggested that all three clustering-based EB analysis methods are preferable over conventional statistical methods. Therefore, HSID accuracy may be further improved by properly classifying roadway segments on the basis of the heterogeneity in the data.