Multi-classification Prediction and Interaction Effects of Determinants for Accident Severity on Two-lane Highways in Plateau Mountainous Region
-
摘要: 针对高原山区双车道公路交通事故多类别预测精度不足及多因素交互机制不明确的问题,通过引入特征参数选择、极限梯度提升算法(extreme gradient boosting,XGBoost)、基于遗传算法(genetic algorithm,GA)超参数优化和部分依赖图(partial dependence plots,PDP),提出1种交通事故严重度三分类的可解释机器学习预测框架。以云南山区双车道公路2012—2017年的事故数据为基础,融合道路线形、交通环境、涉事车辆等14维特征,构建GA-XGBoost模型,与随机森林(random forest,RF)、支持向量机(support vector machine,SVM)及基准XGBoost对比,结合PDP探究不同风险致因对交通事故严重程度的影响机制。结果表明:①GA-XGBoost综合预测性能最优,准确率、精确率、召回率分别达81.57%、73.12%、82.68%,并且经过GA算法优化后对受伤事故和死亡事故的预测能力相比优化前分别提高14.58%和50.00%,正确分类死亡事故的数量分别是RF和SVM模型的3倍,有效提高了预测严重事故的能力;②车辆特性和交通环境特性更易对事故发生造成影响,其中肇事车型、涉事车型、事故形态和日交通量是影响最高的4个风险因素;③无论何种肇事方式,当涉事方为行人和摩托车时会显著提升事故严重程度,其中行人涉事对事故严重度的抬升作用是其余方式的1.25~5倍,同时,随着交通量的增加,侧面碰撞对事故伤害程度的提升效应会逐渐上升。Abstract: Current studies suffer from the insufficient prediction accuracy for multi-classification prediction and unclear interaction mechanisms for accidents severity on two-lane highways in plateau mountainous region. To address these issues, this study proposes an XGBoost-based three-classification prediction framework, optimized by a genetic algorithm (GA). The framework is tested based on accident data from 2012 to 2017 on mountainous two-lane highways in Yunnan. It integrates 14 features, such as road geometry, traffic environment, and type of involved vehicle. The model performance is compared with random forest (RF), support vector machine (SVM), and the baseline XGBoost model. Additionally, partial dependence plots (PDP) are used to explore the influence mechanisms of different risk determinants on accident severity. The results show that: ①The proposed GA-XGBoost model has the best overall prediction performance, with accuracy, precision, and recall rates reaching 81.57%, 73.12%, and 82.68%, respectively. After optimization with the GA algorithm, the predictive accuracy for injury and fatal accidents improves by 14.58% and 50.00%, respectively, compared to those of the pre-optimization model. The number of correctly classified fatal accidents is three times than that of the RF and SVM models. All these show significant improvement of the ability to predict severe accidents. ②Factors reflecting vehicle characteristics and traffic environment have a more significant impact on accident occurrence. Among them, the type of causing-trouble vehicle, type of involved vehicle, accident type, and daily traffic volume are the top four risk factors. ③Regardless of the type of accident, when pedestrians or motorcycles are involved, the severity of the accident is significantly increased. Among them, pedestrian involvement increases the severity of the accident by 1.25 to 5 times higher than that of any other involvement type. Additionally, as traffic volume increases, the impact of side collisions on accident severity gradually increases.
-
表 1 变量编码表
Table 1. Description of variables
变量及编号 变量赋值 比例/% 性别 1-男 87.4 2-女 12.6 年龄/岁 1-≤29 18.3 2->29~39 33.1 3->39~48 30.9 4->48 17.7 日交通量(/veh/d) 1-≤1500 10.0 2->1500~2500 42.0 3->2500~3500 37.2 4->3500 10.8 道路线形 1-直线 39.3 2-圆曲线 36.7 3-缓和曲线 24.0 坡长/m 1-<1000 67.5 2-≥1000 32.5 坡度a(/°) 1-[0, 3] 55.6 2->3 44.4 肇事方 1-小型客车 72.8 2-货车 12.3 3-大中型客车 7.5 4-摩托车 6.3 5-其它车型 1.1 涉事方 1-小型客车 47.8 2-货车 3.9 3-大中型客车 2.2 4-摩托车 26.1 5-行人 4.2 6-非机动车 14.1 7-其它 1.7 白天/夜晚 1-白天(07:00—19:00) 76.5 2-夜晚(19:00—07:00) 23.5 工作日/周末 1-工作日 68.2 2-周末 31.8 晴天/非晴天 1-晴 92.7 2-非晴天 7.3 季度 1-第一季度 29.1 2-第二季度 20.1 3-第三季度 18.6 4-第四季度 32.2 干燥/潮湿 1-干燥 94.0 2-潮湿 6.0 事故形态 1-正面碰撞 56.8 2-侧面碰撞 27.7 3-单车事故 2.9 4-追尾 9.7 5-侧翻 8.6 6-碰撞障碍物 3.6 7-其他 0.5 注:a事故数据中未区分行车方向,此处坡度数据为实际坡度数据的绝对值。 表 2 最优超参数
Table 2. Optimal hyperparameters
超参数 范围 最优值 booster “gbtree”“gbliner” gbtree n_estimators [80,300] 150 eta [0.01, 0.30] 0.10 max_depth [2,7] 3 表 3 模型性能
Table 3. Model performance
模型 A/% P/% R/% RF 77.90 73.01 74.99 SVM 75.94 73.70 70.82 XGBoost 78.43 67.26 82.50 GA-XGBoost 81.57 73.12 82.68 -
[1] SANTOS K, DIAS J P, AMADO C. A literature review of machine learning algorithms for crash injury severity prediction[J]. Journal Of Safety Research, 2022, 80: 254-269. doi: 10.1016/j.jsr.2021.12.007 [2] 戢晓峰, 谢世坤, 覃文文, 等. 基于轨迹数据的山区危险性弯道路段交通事故风险动态预测[J]. 中国公路学报, 2022, 35 (4): 277-285.JI X F, XIE S K, QIN W W, et al. Dynamic prediction of traffic accident risk in risky curve sections based on vehicle trajectory data[J]. China Journal of Highway and Transport, 2022, 35(4): 277-285. (in Chinese) [3] 陆欢, 戢晓峰, 杨文臣, 等. 高原山区公路环境下交通事故形态致因分析[J]. 中国安全科学学报, 2019, 29(5): 44-49.L H, JI X F, YANG W C et al. Cause analysis of different patterns of traffic accidents on plateau mountain roads[J]. China Safety Science Journal, 2019, 29(5): 44-49. (in Chinese) [4] LI J, FANG S, GUO J, et al. A motorcyclist-injury severity analysis: a comparison of single-, two-, and multi-vehicle crashes using latent class ordered probit model[J]. Accident Analysis & Prevention, 2021, 151: 105953. [5] RUSDI R, MAZHARUL H M, POOYAN A A, et al. Applying a random parameters Negative Binomial Lindley model to examine multi-vehicle crashes along rural mountainous highways in Malaysia[J]. Accident Analysis & Prevention, 2018, 119: 80-90. [6] BHAM G H, JAVVADI B S, MANEPALLI U. Multinomial logistic regression model for single-vehicle and multivehicle collisions on urban u. s. highways in arkansas[J]. Journal of Transportation Engineering, 2012, 138(6): 786-797. doi: 10.1061/(ASCE)TE.1943-5436.0000370 [7] LI Z, LIU P, WANG W, et al. Using support vector machine models for crash injury severity analysis[J]. Accident Analysis & Prevention, 2012, 45: 478-486. [8] WANG L-L, NGAN H Y T, YUNG N H C. Automatic incident classification for large-scale traffic data by adaptive boosting SVM[J]. Information Sciences, 2018, 467: 59-73. doi: 10.1016/j.ins.2018.07.044 [9] LI K, XU H, LIU X. Analysis and visualization of accidents severity based on LightGBM-TPE[J]. Chaos, Solitons & Fractals, 2022, 157: 111987. [10] SILVA P B, ANDRADE M, FERREIRA S. Machine learning applied to road safety modeling: a systematic literature review[J]. Journal of Traffic and Transportation Engineering (English Edition), 2020, 7(6): 775-790. doi: 10.1016/j.jtte.2020.07.004 [11] 张开冉, 阚丁萍, 陈多多. 不同等级农村公路交通事故严重程度预测研究[J]. 安全与环境学报, 2024, 24(4): 1515-1522.ZHANG K R, KAN D P, CHEN D D. Study on predicting the severity of traffic accidents in different grades of rural highways[J]. Journal of Safety and Environmemt, 2024, 24 (4): 1515-1522. (in Chinese) [12] SHEN X, WEI S. Application of XGBoost for hazardous material road transport accident severity analysis[J]. IEEE Access, 2020, 8: 206806-206819. doi: 10.1109/ACCESS.2020.3037922 [13] QU Y, LIN Z, LI H, et al. Feature recognition of urban road traffic accidents based on GA-XGBoost in the context of big data[J]. IEEE Access, 2019(7): 170106-170115. [14] HIJAZI H, SATTAR K, AL-AHMADI H M, et al. Comparative study for optimized deep learning-based road accidents severity prediction models[J]. Arabian Journal for Science and Engineering, 2024, 49(4): 5853-5873. doi: 10.1007/s13369-023-08510-4 [15] TANG J, ZHENG L, HAN C, et al. Statistical and machine-learning methods for clearance time prediction of road incidents: a methodology review[J]. Analytic Methods in Accident Research, 2020, 27: 100123. doi: 10.1016/j.amar.2020.100123 [16] 高雪林, 汤厚骏, 沈佳平, 等. 基于XGBoost的高速公路事故类型及严重程度预测方法[J]. 交通信息与安全, 2023, 41 (4): 55-63. doi: 10.3963/j.jssn.1674-4861.2023.04.006GAO X L, TANG H J, SHEN J P, et al. A method for predicting the type and severity of freeway accidents based on XGBoost[J]. Journal of Transport Information and Safety, 2023, 41(4): 55-63. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2023.04.006 [17] LEE J, CHAE J, YOON T, et al. Traffic accident severity analysis with rain-related factors using structural equation modeling: a case study of Seoul city[J]. Accident Analysis & Prevention, 2018, 112: 1-10. [18] WEI F, XU P, GUO Y, et al. Exploring the injury severity of vulnerable road users to truck crashes by ensemble learning[J]. Journal of Transportation Safety & Security, 2024, 16 (11): 1259-1282. [19] AFSHAR F, SEYEDABRISHAMI S, MORIDPOUR S. Application of extremely randomised trees for exploring influential factors on variant crash severity data[J]. Scientific Reports, 2022, 12(1): 11476. doi: 10.1038/s41598-022-15693-7 [20] 李英帅, 张旭, 王卫杰, 等. 基于随机森林的电动自行车骑行者事故伤害程度影响因素分析[J]. 交通运输系统工程与信息, 2021, 21(1): 196-200.LI Y S, ZHANG X, WANG W J, et al. Factors affecting electric bicycle rider injury inaccident based on random forest model[J]. Journal of Transportation Systems Engineering and Information Technology, 2021, 21(1): 196-200. (in Chinese) [21] MA Z, MEI G, CUOMO S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors[J]. Accident Analysis & Prevention, 2021, 160: 106322. [22] G Y, A T, G P. Investigation of road accident severity per vehicle type[J]. Transportation research procedia, 2017, 25: 2076-2083. doi: 10.1016/j.trpro.2017.05.401 [23] JOHNSON P M, BARBOUR W, CAMP J V, et al. Usingmachine learning to examine freight network spatial vulnerabilities to disasters: a new take on partial dependence plots[J]. Transportation Research Interdisciplinary Perspectives, 2022, 14: 100617. doi: 10.1016/j.trip.2022.100617 [24] PARSA A B, MOVAHEDI A, TAGHIPOUR H, et al. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis[J]. Accident Analysis & Prevention, 2020, 136: 105405. -
下载: