Traffic Crash Modeling Considering Inconsistent Observations, Interaction Behavior, and Nonlinear Relationships
MetadataShow full item record
Traffic collisions are a worldwide issue that can cause injury and death, which leads to billions of dollars in damages every year. Significant research efforts have been undertaken to develop and utilize statistical modeling techniques for analyzing the characteristics of crash count data. While these modeling techniques have been providing meaningful outputs, improvements on these modeling methods still need to better understand the crash risk and the contributing factors. Five important issues in crash data modeling are identified in this research. The first two issues are over or under dispersion with crash data and excess zeros within crash records. Considering that they have been well studied in the previous research, this study focuses on the remaining three major issues. The first one is relevant to the partial observations of multiple processes, i.e. crash data may be collected by different agencies that create multiple data sources and may be inconsistent. A modeling mechanism that takes advantage of all datasets for better estimation results is highly desirable. The second one is an interaction issue. Some collisions are single vehicle crashes, such as off-road crashes and rollover incidents, and some collisions involve interaction behavior, such as the Animal-Vehicle Collision (AVC) and the Vehicle-Vehicle Collision. The characteristics of crashes with interaction behavior are different from those with only one vehicle involved. It is challenging to develop a crash modeling scheme that can capture the interaction behavior. The last one is the nonlinear relationship issue. Most previous collision models are Generalized Linear Model-based (GLM-based) approaches. Such GLM-based approaches are constrained by their linear model specifications because, in most situations, the relationship between the crash rate and its contributing factors are not linear or may not even be monotonic. Thus, finding a way to model the collision data with nonlinear and non-monotonic relationships is of utmost importance. To address the issues of inconsistent observations, two techniques are developed. A fuzzy logic-based data mapping algorithm is proposed as the first technique to match data from two datasets so that duplicate crash records can be removed when combining these datasets. The membership functions of the fuzzy logic algorithm are established based on survey inputs collected from experts of the Washington State Department of Transportation (WSDOT). As verified by expert judgment collected through another survey, the accuracy of this algorithm was approximately 90%. Applying this algorithm to the two WSDOT datasets relevant to AVC, reported AVC data and the Carcass Removal (CR) data, the combined dataset has 15% -22% more records compared to the original CR dataset. The proposed algorithm is proven effective for merging the Reported AVC data and the CR data, with a combined dataset being more complete for wildlife safety studies and countermeasure evaluations. The second technique is a diagonal inflated bivariate Poisson regression (DIBP) method. It is an inflated version of bivariate Poisson regression model adopted to directly fit two datasets together. The proposed model technique was also applied to the reported AVC and CR data sets collected in Washington State between 2002 and 2006. The diagonal inflated bivariate Poisson model not only can model paired data with correlation, but also handle under- or over- dispersed data sets. Compared with three other types of models; double Poisson, bivariate Poisson, and zero-inflated double Poisson; the diagonal inflated bivariate Poisson model demonstrates its capability of fitting two datasets with remarkable overlapping portions resulting from the same stochastic process. Therefore, the diagonal inflated bivariate Poisson model provides researchers another new approach to investigating paired data sources from a different perspective. To address the issues with the interaction issue, a new occurrence mechanism-based probability model, an interaction-based model, which explicitly formulates the interactions between the objects, is introduced. The proposed method was applied to the AVC data and this method can explicitly formulate the interactions between animals and drivers to better capture the relationships among drivers' and animals' attributes, roadway and environmental factors, and AVCs. Findings of this study show that the proposed occurrence mechanism-based probability model better capture the impact of drivers' and animals' attributes on the AVC. This method can be further developed to model other types of collisions with interaction behavior. To address the nonlinear relationship issue, a Generalized Nonlinear Model (GNM)-based approach is put forward. The GNM-based approach is developed to utilize a nonlinear regression function to better elaborate non-monotonic relationships between the independent and dependent variables. Previous studies focused mainly on causal factor identification and crash risk modeling using Generalized Linear Models (GLMs), such as Poisson regression, and logistic regression among others. However, their basic assumption of a generalized linear relationship between the dependent variable (for example, crash rate) and independent variables (for example, contributing factors to crashes) established via a link function can often be violated in reality. Consequently, the GLM-based modeling results could provide biased findings and conclusions when the contributing factors have parabolic impact on the crashes. In this research, a GNM-based approach is applied with the rear end accident data and the AVC data collected from ten highway routes starting in 2002 and ending in 2006. For the rear-end collision application, the results show that truck percentage and grade have a parabolic impact: both items increase crash risks initially, but decrease risks after certain thresholds. Similarly, Annual Average Daily Traffic (AADT) and grade also have a parabolic impact on the AVC rate. Such non-monotonic relationships cannot be captured by regular GLM's, which further demonstrates the flexibility of GNM-based approaches in modeling the nonlinear relationship among data and providing more reasonable explanations. The superior GNM-based model interpretations better explain the parabolic impacts of some specific contributing factors and help in selecting and evaluating rear-end crash safety improvement plans. In Summary, these solutions proposed to address the three major issues in crash modeling are important for crash studies. The fuzzy-logic based data mapping algorithm can combine partial observations from different processes to form up a more complete dataset for a thorough analysis. The diagonal inflated bivariate Poisson models can directly take two data observation processes into account. The occurrence mechanism based probability models and GNM based models are effective methods for handling the interaction issue and non-linear relationships between dependent and independent variables.
- Civil engineering