Coupling Analysis of Causative Factors for Severe Traffic Accidents Considering Sample Imbalance
-
摘要: 道路交通事故频发,但基于传统事故严重程度分类的数据比例往往不平衡,为探究样本比例不平衡时多维因素的耦合作用对严重交通事故的影响,研究了1种融合自适应合成采样(adaptive synthetic sampling,ADASYN)算法、Stacking集成学习模型与Apriori算法的分析框架,利用美国交通部2017—2021年道路交通死亡分析报告的数据,从“人、车、路、环境”这4个维度选取15个潜在特征变量,分析多维因素耦合对严重事故的影响。本文利用ADASYN算法进行样本不平衡性处理,选取经典的4类机器学习模型:随机森林(random forest,RF)、分类提升(categorical boosting,CatBoost)、极端梯度提升(extreme gradient boosting,XGBoost),以及梯度提升决策树(gradient boosting decision tree,GBDT)作为基模型,并通过比较5种不同的元模型,即逻辑回归模型、高斯朴素贝叶斯模型、支持向量机、轻量级梯度提升机、多层感知机,筛选出与所选基学习器结合后泛化能力最优的Stacking集成学习模型,随后基于最优模型获取特征重要性排序,筛选出关键因素,并利用Apriori算法对特征进行多维耦合分析,探究五维因素耦合对严重交通事故率的影响。研究表明:①以逻辑回归作为元模型结合RF、CatBoost、XGBoost,以及GBDT作为基学习器构成的集成学习模型效果最优,召回率达0.80;②道路类型、季节、碰撞类型、碰撞时的灯光情况、驾驶人饮酒这5个因素重要性占全部因素总重要性的53.2%,显著高于其他变量,其中严重事故率最高的为碰撞类型特征中的“与树木等杆状物碰撞”,达86.2%。且有光时的严重事故率比无光时的严重事故率提高了13.5%;③多维因素耦合分析发现,自治市道路与驾驶人未饮酒、碰撞时处于无光-有光环境,季节为秋季等多维因素耦合时耦合时发生严重事故的概率最高,置信度达89.0%,打破了未饮酒被认为是低风险因素的常规认知。Abstract: Road traffic accidents occur frequently, yet the data distribution based on traditional accident severity classification is often imbalanced. To explore the coupling effects of multidimensional factors on severe traffic accidents under sample imbalance conditions, this study proposes an analytical framework integrating the Adaptive Synthetic Sampling (ADASYN) algorithm, a Stacking ensemble learning model, and the Apriori algorithm. Utilizing data from the U.S. Department of Transportation's Fatality Analysis Reporting System (FARS) from 2017 to 2021, fifteen potential feature variables are selected across four dimensions—human, vehicle, road, and environment—to analyze the effects of multidimensional factor coupling on the occurrence of severe accidents. The ADASYN algorithm was employed to address sample imbalance. Four classical machine learning models including random forest (RF), categorical boosting (CatBoost), extreme gradient boosting (XGBoost), and gradient boosting decision tree (GBDT), are selected as base learners. Five types of meta-learners, namely logistic regression, Gaussian Na?ve Bayes, support vector machine (SVM), light gradient boosting machine (LightGBM), and multilayer perceptron (MLP), are compared to identify the optimal Stacking ensemble model with the strongest generalization performance. Based on the optimal model, feature importance ranking is obtained to determine key influencing factors, followed by the application of the Apriori algorithm for multidimensional coupling analysis, which explored the impact of five-dimensional factor coupling on the rate of severe accidents. The results indicate that: ①The Stacking ensemble model composed of Logistic Regression as the meta-learner and RF, CatBoost, XGBoost, and GBDT as base learners achieved the best overall performance, with a recall of 0.80; ②The five factors of road type, season, collision type, lighting conditions at the time of the collision, and driver alcohol consumption, accounted for 53.2% of the total importance of all factors, which is substantially higher than that of the other variables. Among them, collisions involving"impact with trees or other pole-like objects"exhibited the highest severe accident rate at 86.2%, and the severe accident rate under illuminated conditions is 13.5% higher than under non-illuminated conditions; ③ Multidimensional factor coupling analysis reveals that the probability of severe crashes is highest when multiple factors coexist: municipal roads, sober drivers, transitions between unlit and lit lighting conditions at the time of the collision, and the autumn season. Under this coupled condition, the confidence level reaches 89.0%, challenging the conventional perception that non-drinking is a low-risk factor.
-
表 1 变量取值和离散情况
Table 1. Variable values and discretization
变量类别 变量名称 变量取值 因变量 驾驶人伤害严重程度 0_死亡、1_未死亡 自变量 车辆特征 车辆类型 0_载客汽车、1_载货汽车、2_其他 车辆碰撞前的速度/(km/h) 0_=0、1_ > 0~155、2_ > 155~245 时间特征 是否节假日 0_是、1_否 休息日与工作日 0_休息日、1_工作日 时段 0_清晨、1_日间、2_傍晚、3_夜间 道路特征 发生事故的道路类型 0_国道、1_临街道路、2_县道、3_乡镇道路4_州际公路、5_州内公路、6_自治市道路 是否在交叉路口 0_是、1_否 交通事故发生在道路的位置 0_车行道路内、1_路肩、2_车行道路外 道路限速/(km/h) 0_ > 0~50、1_ > 50~90、2_ > 90~140 地区特征 事故发生地区 0_城市、1_乡村 碰撞特征 碰撞类型 0_多车辆碰撞、1_与交通基础设施碰撞、2_与树木等杆状物碰撞、3_其他 碰撞时的灯光情况 0_有光、1_无光、2_无光-有光 环境特征 天气 0_不良天气、1_非不良天气(晴朗、多云) 季节 0_春季、1_夏季、2_秋季、3_冬季 驾驶人特征 驾驶人饮酒 0_是、1_否 注:表中以单位km/h的速度值是根据MPH(英里/h)换算后四舍五入进行取值。 表 2 模型组合情况
Table 2. Model combination
基学习器 元学习器 RF、Catboost、XGBoost、GBDT 逻辑回归模型
高斯朴素贝叶斯模型
支持向量机
轻量级梯度提升机
多层感知机表 3 不同元模型的Stacking集成学习模型对比
Table 3. Comparison of Stacking ensemble learning models with different meta models
元模型 精确率 召回率 F1分数 准确率 AUC值 LR 0.75 0.80 0.77 0.75 0.80 GNB 0.74 0.78 0.75 0.73 0.79 SVM 0.75 0.79 0.76 0.73 0.77 LightGBM 0.72 0.75 0.74 0.72 0.76 MLP 0.73 0.80 0.77 0.73 0.79 表 4 关键因素重要性
Table 4. Importance of key factors
关键因素 重要性 碰撞类型 4.97 发生事故的道路类型 2.68 季节 2.19 驾驶人饮酒 1.93 碰撞时的灯光情况 1.93 表 5 关键因素的卡方信息
Table 5. Key factor chi square information
关键因素 卡方统计量 P值 发生事故的道路类型 653.37 0.00 季节 78.84 0.00 碰撞类型 4 609.93 0.00 碰撞时的灯光情况 1 094.02 0.00 驾驶人饮酒 1 372.23 0.00 表 6 强耦合度规则
Table 6. Strong coupling degree rule
规则 事故发生的道路类型 其他项 支持度 置信度 提升度 1 自治市道路 碰撞类型=其他 & 驾驶人饮酒=否 & 碰撞时的灯光情况=无光-有光 & 季节=冬季 0.012 3 0.88 1.39 2 驾驶人饮酒=否 & 碰撞类型=其他 & 碰撞时的灯光情况=有光 & 季节=春季 0.011 2 0.87 1.16 3 碰撞类型=其他 & 驾驶人饮酒=否 & 碰撞时的灯光情况=无光-有光 & 季节=秋季 0.020 5 0.89 2.02 4 碰撞类型=其他 & 季节=夏季 & 碰撞时的灯光情况=有光 & 驾驶人饮酒=否 0.012 3 0.88 1.32 5 州内公路 碰撞类型=其他 & 碰撞时的灯光情况=有光 & 季节=冬季 & 驾驶人饮酒=否 0.014 0 0.87 1.15 6 碰撞类型=其他 & 碰撞时的灯光情况=无光-有光 & 驾驶人饮酒=否 & 季节=秋季 0.012 7 0.86 1.36 7 碰撞类型=其他 & 驾驶人饮酒=否 & 碰撞时的灯光情况=无光-有光 & 季节=冬季 0.014 3 0.85 1.35 8 碰撞类型=其他 & 碰撞时的灯光情况=无光-有光 & 季节=冬季 & 驾驶人饮酒=否 0.014 3 0.84 1.11 9 国道 碰撞类型=其他 & 季节=冬季 & 碰撞时的灯光情况=无光 & 驾驶人饮酒=否 0.011 7 0.86 1.14 -
[1] 国家统计局. 中国统计年鉴[M]. 北京: 中国统计出版社, 2023.National Bureau of Statistics of China. China statistical yearbook[M]. Beijing: China Statistics Press, 2023. (in Chinese) [2] 王朝健, 张道文, 蒋骏, 等. 考虑数据不平衡的城市道路乘用车致命事故率分析[J]. 交通信息与安全, 2023, 41(5): 43-53. doi: 10.3963/j.jssn.1674-4861.2023.05.005WANG C J, ZHANG D W, JIANG J, et al. An analysis of fatal accident rates of passenger cars on urban roads considering imbalanced data samples[J]. Journal of Transport Information and Safety, 2023, 41(5): 43-53. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2023.05.005 [3] CHEN P, ZHANG Z, HUANG Y, et al. Risk assessment of marine accidents with fuzzy bayesian networks and causal analysis[J]. Ocean & Coastal Management, 2022, 228: 106323. [4] DHANOA K K, TIWARI G, MANOJ M. Modeling fatal traffic accident occurrences in small Indian cities, patiala, and rajpura[J]. International Journal of Injury Control and Safety Promotion, 2019, 26(3): 225-232. doi: 10.1080/17457300.2019.1625413 [5] 张道文, 母尧尧, 王朝健, 等. 城市道路交通事故特性及严重程度研究[J]. 安全与环境学报, 2022, 22(2): 599-605.ZHANG D W, MU Y Y, WANG C J, et al. Researchon characteristics and severity of urban road traffic accidents[J]. Journal of Safety and Environment, 2022, 22(2): 599-605. (in Chinese) [6] WEI Z H, ZHANG Y L, DAS S. Applying explainable machine learning techniques in daily crash occurrence and severity modeling for rural interstates[J]. Transportation Research Record, 2023, 2677(5): 611-628. doi: 10.1177/03611981221134629 [7] 单永航, 张希, 胡川, 等. 基于集成学习的交通事故严重程度预测研究与应用[J]. 计算机工程, 2024, 50(2): 33-42.DAN Y H, ZHANG X, HU C, et al. Traffic accident severity prediction research and application based on ensemble learning[J]. Computer Engineering, 2024, 50(2): 33-42. (in Chinese) [8] 陈坚, 邱智宣, 彭涛, 等. 建成环境对城市交通事故严重程度影响研究[J]. 重庆交通大学学报(自然科学版), 2023, 42 (3): 105-111, 150.CHEN J, QIU Z X, PENG T, et al. Influence of built environment on the severity of urban traffic accidents[J]. Journal of Chongqing Jiaotong University(Natural Sciences), 2023, 42 (3): 105-111, 150. (in Chinese) [9] XIAO J L. SVM and KNN ensemble learning for traffic incident detection[J]. Physica A: Statistical Mechanics and its Applications, 2019, 517: 29-35. doi: 10.1016/j.physa.2018.10.060 [10] 郑明. 面向不平衡数据的重采样方法研究[D]. 昆明: 云南大学, 2020.ZHENG M. Resampling methods for imbalanced data[D]. Kunming: YunnanUniversity, 2020. (in Chinese) [11] 覃勤, 李靖, 卢锋. 基于N-K模型长大下坡路段安全风险研究[J]. 公路, 2024, 69(1): 276-281.QIN Q, LI J, LU F. Safety riskanalysis of long downhill sections based on the N-K model[J]. Highway, 2024, 69(1): 276-281. (in Chinese) [12] 靳文舟, 姚尹杰. 多因素耦合作用下的车辆群事故伤害程度估计[J]. 郑州大学学报(工学版), 2021, 42(3): 1-7.JIN W Z, YAO Y J. Estimation of accident injury severity of vehicle groups considering multi-factorcoupling[J]. Journal of Zhengzhou University(Engineering Science), 2021, 42 (3): 1-7. (in Chinese) [13] 王占中, 张书源, 杨萌, 等. 交通事故致因知识图谱构建及风险因素挖掘[J]. 同济大学学报(自然科学版), 2025, 53 (4): 611-618.WANG Z Z, ZHANG S Y, YANG M, et al. Traffic accident causation knowledge graph construction and risk factor mining[J]. Journal of Tongji University(Natural Science), 2025, 53(4): 611-618. (in Chinese) [14] 胡伟涛, 诸葛业琴, 李晓欢, 等. 营运车辆的事故严重程度预测及其风险因素耦合关系研究[J]. 桂林电子科技大学学报, 2025, 41(2): 1-9.HU W T, ZHUGE Y Q, LI X H, et al. Prediction of accident severity and study on coupling relationship of risk factors for commercial vehiclesl[J]. Journal of Guilin University of Electronic Technology, 2025, 41(2): 1-9. (in Chinese) [15] 魏泽平, 刘淼淼, 张学驰. 高速公路交通事故影响因素分析及防控策略[J]. 交通技术, 2022, 11(2): 59-73.WEI Z P, LIU M M, ZHANG X C. Analysis of factors affecting expressway traffic accident and preventive measure[J]. Open Journal of Transportation Technologies, 2022, 11(2): 59-73. (in Chinese) [16] 乔剑锋, 王亚楠, 吕淑然, 等. 基于K-means和LCA的自动驾驶交通事故聚类分析[J]. 中国安全科学学报, 2025, 35 (7): 192-200.QIAO J F, WANG Y N, LYU S R, et al. Cluster analysis of autonomous driving traffic accidents based on K-means and LCA[J]. China Safety Science Journal, 2025, 35(7): 192-200. (in Chinese) [17] 熊睿, 邓院昌. 疲劳驾驶交通事故的严重程度影响因素分析[J]. 中国安全生产科学技术, 2022, 18(4): 20-26.XIONG R, DENG Y C. Analysis on factors affecting severity of traffic accidents caused by fatigue driving[J]. Journal of Safety Science and Technology, 2022, 18(4): 20-26. (in Chinese) [18] 胡立伟, 赵雪亭, 杨锦青, 等. 城市快速过境通道衔接节点交通风险耦合致因模型研究[J]. 中国安全生产科学技术, 2019, 15(12): 150-155.HU L W, ZHAO X T, YANG J Q, et al. Research on coupling cause model of traffic risk in connectingnodes of urban rapid transit channels[J]. Journal of Safety Science and Technology, 2019, 15(12), 150-155. (in Chinese) [19] 邱文利, 杨海峰, 张少波, 等. 基于改进Apriori算法的高速公路交通事故关联分析[J]. 中外公路, 2024, 44(3): 227-235.QIU W L, YANG H F, ZHANG S B, et al. Correlation analysis of highway traffic accidents based on improved apriori algorithm[J]. Journal of China & Foreign Highway, 2024, 44 (3): 227-235. (in Chinese) [20] 吴彪, 王星予, 刘拓, 等. 基于关联分析的城乡结合部交通事故致因识别[J]. 武汉理工大学学报(交通科学与工程版), 2022, 46(6): 948-952.WU B, WANG X Y, LIU T, et al. Identification of traffic accidents causation in rural-urban fringe based on correlation analysis[J]. Journal of Wuhan University of Technology (Transportation Science & Engineering), 2022, 46(6): 948-952. (in Chinese) [21] 李美玲, 李子辉, 陈雪珲, 等. 基于关联规则的高速公路交通事故风险识别[J]. 山东建筑大学学报, 2024, 39(3): 99-106.LI M L, LI Z H, CHEN X H, et al. Expressway traffic accident risk identification based on association rules[J]. Journal of Shandong Jianzhu University, 2024, 39(3): 99-106. (in Chinese) [22] 马庚华, 郑长江, 邓评心, 等. 关联规则挖掘在道路交通事故分析中的应用[J]. 西华大学学报(自然科学版), 2019, 38 (3): 93-97, 112.MA G H, ZHENG C J, DENG P X, et al. Application of association rules mining to traffic accidents analysis[J]. Journal of Xihua University(Natural Science Edition), 2019, 38(3): 93-97, 112. (in Chinese) [23] 杨洋, 王文慧, 吴先宇, 等. 高速公路非常规交通事故研究综述[J]. 应用基础与工程科学学报, 2024, 32(3): 601-626.YANG Y, WANG W H, WU X Y, et al. Review of the research toward freeway unconventional traffic accidents[J]. Journal of Basic Science and Engineering, 2024, 32(3): 601-626. (in Chinese) [24] RIVERA A J, DÁVILA M A, ELIZONDO D, et al. Mldr. resampling: efficient reference implementations of multilabel resampling algorithms[J]. Neurocomputing, 2023, 559: 126806. doi: 10.1016/j.neucom.2023.126806 [25] RODRÍGUEZ N, LÓPEZ D, FERNÁNDEZ A, et al. Soul: scala oversampling and undersampling library for imbalance classification[J]. SoftwareX, 2021, 15: 100767. [26] 王健宇, 陈献天, 焦朋朋, 等. 考虑建成环境的交通事故严重程度致因交互效应研究[J]. 交通运输系统工程与信息, 2024, 24(2): 272-280.WANG J Y, CHEN X T, JIAO P P, et al. Interactive effect on traffic accident severity considering built environment[J]. Journal of Transportation Systems Engineering and Information Technology, 2024, 24(2): 272-280. (in Chinese) [27] 周星, 丁立新, 万润泽, 等. 分类器集成算法研究[J]. 武汉大学学报(理学版), 2015, 61(6): 503-508.ZHOU X, DING L X, WAN R Z, et al. Research on classifier ensemblealgorithms[J]. Journal of Wuhan University (Natural Science Edition), 2015, 61(6): 503-508. (in Chinese) [28] 黄锦, 王梓豪, 陈曾惠, 等. 基于Apriori关联算法的城市综合体停车需求影响因素关联分析[J]. 福建交通科技, 2024, (3): 81-86.HUANG J, WANG Z H, CHEN Z H, et al. Correlation analysis of factors affecting parking demand in urban complexes based on Apriori correlation algorithm[J]. Fujian Traffic Science and Technology, 2024, (3): 81-86. (in Chinese) [29] 陈俊宇, 李金龙, 许伦辉, 等. 基于ADASYN-XGBoost的交通事故自动检测方法[J]. 交通信息与安全, 2023, 41(3): 12-22.CHEN J Y, LI J L, XU LH, et al. An automatic detection method for traffic accidents based on ADASYN-XGBoost[J]. ournal of Transport Information and Safety, 2023, 1 (3): 12-22. (in Chinese) [30] CHEN H Q, CHEN L. Support vector machine classification of drunk driving behaviour[J]. International Journal of Environmental Research and Public Health, 2017, 14: 108. doi: 10.3390/ijerph14010108 [31] MARUYAMA M. Dynamic properties of peak levels of road traffic noise along a freeway[J]. Applied Acoustics, 2020, 160: 107095. doi: 10.1016/j.apacoust.2019.107095 [32] HU X J, QIAO L Q, HAO X T, et al. Research on the impact of entry points on urban arterial roads in the framework of Kerner's three-phase traffic theory[J]. Physica A: Statistical Mechanics and its Applications, 2022, 605: 127962. doi: 10.1016/j.physa.2022.127962 [33] YAN G J, WANG W F, JHANG K M, et al. Association between patients with dementia and high caregiving burden for caregivers from a medical center in Taiwan[J]. Psychology Research and Behavior Management, 2019, 12: 55-65. doi: 10.2147/PRBM.S187676 [34] 冯胤伟, 刘正江, 蒋子怡, 等. 基于关联规则挖掘和复杂网络理论的船舶碰撞事故影响因素分析[J]. 大连海事大学学报, 2023, 49(3): 31-44.FENG Y W, LIU Z J, JIANG Z Y, et al. Analysis of factors affecting ship collisions based on association rule mining and complex network theory[J]. Journal of Dalian Maritime University, 2023, 49(3): 31-44. (in Chinese) [35] 冯晓锋, 徐硕, 袁军. 基于关联规则的新能源车交通事故致因分析[J]. 中国人民公安大学学报(自然科学版), 2024, 30(1): 37-43.FENG X F, XU S, YUAN J. Causal analysis of new energy vehicle traffic accidents based on association rules[J]. Journal of People's Public Security University of China(Science and Technology), 2024, 30(1): 37-43. (in Chinese) -
下载: