Machine learning analysis of wrong-way driving crash severity factors: evidence from California Highway Patrol data
Main Article Content
Abstract
Wrong-way driving (WWD) crashes remain a major highway safety concern in the United States, yet many existing datasets do not capture key behavioral details, such as the distance traveled in the wrong direction before a crash. This study uses a comprehensive five-year California Highway Patrol dataset (2016–2020) to examine WWD crash severity using tree-based ensemble learning models, with Random Forest (RF) as a baseline model and XGBoost selected as the final model for interpretation. The dataset includes unique variables such as WWD distance, driver demographics, BAC level, safety equipment use, and crash context. Among the tested models, XGBoost achieved the best overall performance, with an accuracy of 56.35%, and showed good classification capability across fatal, injury, and property-damage-only (PDO) crashes. Variable importance results identified driver age, WWD distance, BAC, time of day, number of vehicles involved, safety equipment use, and driver sex as the most influential predictors. Partial dependence analysis revealed strong non-linear effects: younger and older drivers, higher BAC levels, longer WWD distances, nighttime conditions, and multi vehicle involvement were associated with more severe crash outcomes, while seatbelt use consistently reduced severity. In addition, the Severity Index analysis, where higher values represent greater economic loss, showed that age, WWD distance, number of vehicles, and safety equipment were also key determinants of crash-related economic burden. The findings support targeted countermeasures, including education for high-risk drivers, rapid correction strategies near WWD entry points, and stronger enforcement of seatbelt and impaired-driving laws.