留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合PCA-LPP与DBSCAN的道路交通事故分类及风险等级预测方法

辛怡 李刚 邓有为 张生鹏 周盼 刘怡阳

辛怡, 李刚, 邓有为, 张生鹏, 周盼, 刘怡阳. 融合PCA-LPP与DBSCAN的道路交通事故分类及风险等级预测方法[J]. 交通信息与安全, 2023, 41(4): 44-54. doi: 10.3963/j.jssn.1674-4861.2023.04.005
引用本文: 辛怡, 李刚, 邓有为, 张生鹏, 周盼, 刘怡阳. 融合PCA-LPP与DBSCAN的道路交通事故分类及风险等级预测方法[J]. 交通信息与安全, 2023, 41(4): 44-54. doi: 10.3963/j.jssn.1674-4861.2023.04.005
XIN Yi, LI Gang, DENG Youwei, ZHANG Shengpeng, ZHOU Pan, LIU Yiyang. Classifying Road Accidents and Forecasting Level of Risk Based on a Combined PCA-LPP and DBSCAN Method[J]. Journal of Transport Information and Safety, 2023, 41(4): 44-54. doi: 10.3963/j.jssn.1674-4861.2023.04.005
Citation: XIN Yi, LI Gang, DENG Youwei, ZHANG Shengpeng, ZHOU Pan, LIU Yiyang. Classifying Road Accidents and Forecasting Level of Risk Based on a Combined PCA-LPP and DBSCAN Method[J]. Journal of Transport Information and Safety, 2023, 41(4): 44-54. doi: 10.3963/j.jssn.1674-4861.2023.04.005

融合PCA-LPP与DBSCAN的道路交通事故分类及风险等级预测方法

doi: 10.3963/j.jssn.1674-4861.2023.04.005
基金项目: 

国家重点研发计划项目 2021YFB2601301

陕西省重点研发计划项目 2020ZDLGY09-03

广西重点研发计划项目 桂科AB20159032

详细信息
    作者简介:

    辛怡(2000—),硕士研究生. 研究方向:机器学习、数据治理和智慧交通. E-mail:2021132037@chd.edu.cn

    通讯作者:

    李刚(1975—),博士,教授. 研究方向:交通能源融合、智慧交通、机器学习、智能检测及软件工程. E-mail:15229296166@chd.edu.cn

  • 中图分类号: U491.31

Classifying Road Accidents and Forecasting Level of Risk Based on a Combined PCA-LPP and DBSCAN Method

  • 摘要: 道路交通事故是全球范围内造成大量人员伤亡和财产损失的重大问题之一,通过对道路交通事故进行分类和风险等级预测,能够锁定高风险车辆,以减小事故的发生和人员伤亡的概率。交通事故往往由环境、天气、道路条件、路段设施等多维特征相互作用形成,现有的事故影响分析方法缺乏对交通事故数据的综合研究。为此本文提出1种交通事故分类模型,在传统PCA算法的基础上通过衡量各等级数据间的相似性对数据集进行二次降维,采用改进后降维算法PCA-LPP处理大规模交通事故数据集;利用DBSCAN算法对事故数据划分风险区域,根据迭代训练出的各等级空间对模拟车辆环境进行风险划分。试验结果表明:在大规模交通数据降至不同维度的对比实验中,证明PCA-LPP算法使降维后的特征与样本的类别相关程度更高;同时,利用基于密度的DBSCAN聚类算法处理复杂且伴有偶发性的交通事故数据时,算法的纯度为0.942 9、兰德指数为0.946 2,互信息指数为0.678 4,与K-means、谱聚类等传统算法结果相比,DBSCAN算法的各项评估指标均高于其他算法,从分类效果图发现该模型减少了噪声数据的影响;最后,通过消融实验验证了带有二次降维的PCA-LPP算法的各项评估指标均为最高。其预测结果的混淆矩阵显示该模型对各风险等级的精确率分别为85.77%、70.78%、80.65%,验证了模型的有效性与实用性。

     

  • 图  1  PCA-LPP工作示意图

    Figure  1.  Schematic diagram of PCA-LPP

    图  2  自适应选取参数原理图

    Figure  2.  Self-adaptive parameter selection schematic diagram

    图  3  模型流程图

    Figure  3.  The procedure of model

    图  4  PCA与PCA-LPP降维结果对比图

    Figure  4.  ComparedwithPCA-LPPdimensionalityreductionresults

    图  5  PCA-LPP不同参数效果对比图

    Figure  5.  PCA-LPP comparison chart of different parameters

    图  6  密度半径参数列表与K值的关系曲线

    Figure  6.  Relationship curve between radius Eps parameter list and K value

    图  7  聚类数量与K值关系图

    Figure  7.  Graph of the relationship between cluster number and k value

    图  8  带噪声的候选密度半径与各评估指标关系图

    Figure  8.  Relationship between candidate density radius with noise and each evaluation index

    图  9  候选密度半径与各评估指标关系图

    Figure  9.  Relationship between candidate density radius and each evaluation index

    图  10  聚类算法分类效果对比图

    Figure  10.  Comparison chart of classification effect of clustering algorithm

    图  11  混淆矩阵

    Figure  11.  Confusion matrix

    表  1  事故影响因素统计

    Table  1.   Statistics of accident influencing factors

    类型 变量 连续变量 离散变量
    最小值 最大值 均值 标准差 众数 样本量 比例/%
    环境 白天 11 719 56.80
    月份 1 12 6.5 4.71 12
    夜晚 8 913 43.20
    小时/h 0 24 11.51 5.48 7
    天气 天气描述 阴天
    温度/°F -8 103 35.59 11.07 41
    湿度/% 9 100 74 18.65 阴天
    风寒指数/°F -25.2 107 26.93 13.87 37.1
    风速(/ m/s) 0 22.620 224 4.376 521 6 2.436 368 2.592 832
    压力/ kpa 69.243 7 105.067 6 101.546 14 1.726 86 101.986 32
    道路条件 能见度/km 0 112.654 12.713 8 5.6 9.334 2
    信号灯 2 888 14
    交汇口 1 485 7.2
    十字路口 1 093 5.3
    环岛 2 0.009 7
    减速带 1 0.004 8
    路障 433 0.021
    路段设施 娱乐设施 256 1.24
    礼让标志 536 0.26
    停止牌 255 1.23
    铁路 124 0.60
    车站 280 1.36
    下载: 导出CSV

    表  2  预处理后的数据特征

    Table  2.   Preprocessed data characteristics

    编号 影响间距/m 温度/°F 相对湿度/% 气压/kPa 能见度/km 风速(/ m/s) 月份 小时
    0 0.01 41 81 103.002 1 16.093 44 3.084 576 2 3
    1 1.88 41 93 99.379 1 4.023 36 5.140 96 4 9
    2 9.48 42.1 85 101.546 14 16.093 44 1.564 64 5 4
    $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$
    20 630 0.01 46.4 100 103.020 9 16.093 44 1.564 64 1 7
    20 631 0 46.4 100 101.986 32 4.828 3.621 024 1 9
    20 632 2.28 39.9 97 103.069 84 4.023 36 4.649 216 1 7
    下载: 导出CSV

    表  3  聚类算法性能比较

    Table  3.   Performance comparison of clustering algorithms

    算法 Purity RI MI
    K-means 0.902 2 0.740 7 0.435 1
    Mean Shift 0.702 1 0.768 1 0.500 7
    Spectral Clustering 0.666 4 0.700 2 0.389 8
    DBSCAN(含离群点)-3簇 0.833 3 0.818 2 0.625 3
    DBSCAN-3簇 0.942 9 0.946 2 0.678 4
    DBSCAN(含离群点)-5簇 0.856 4 0.841 1 0.665 3
    DBSCAN-5簇 0.889 3 0.882 0 0.7362
    下载: 导出CSV

    表  4  消融实验聚类效果对比

    Table  4.   Comparison of clustering effects of ablation experiments

    算法 密度半径 最小点数 Purity RI MI
    DBSCAN 720 70 0.501 9 0.511 3 0.116 8
    PCA-3D+DBSCAN 348 70 0.411 7 0.520 0 0.017 5
    PCA-8D+DBSCAN 704 50 0.498 4 0.499 1 0.118 6
    PCA-LPP+DBSCAN 51 63 0.833 3 0.818 2 0.625 3
    下载: 导出CSV
  • [1] 黎健侃, 李泽炜, 华汶雯, 等. 城市道路交通事故统计分析[J]. 科技创新与应用, 2021, 11(21): 74-76. https://www.cnki.com.cn/Article/CJFDTOTAL-CXYY202121024.htm

    LI J K, LI Z W, HUA W W, et al. Statistical analysis of urban traffic accidents[J]. Technology Innovation and Application, 2021, 11(21): 74-76. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-CXYY202121024.htm
    [2] 林宣财, 张旭丰, 王佐, 等. 基于交通事故多发位置的区间平均纵坡控制指标研究[J]. 公路交通科技, 2021, 38(9): 105-113. https://www.cnki.com.cn/Article/CJFDTOTAL-GLJK202109014.htm

    LIN X C, ZHANG X F, WANG Z, et al. Study on control indicator of interval average longitudinal slope based on location of traffic accidents[J]. Journal of Highway and Transportation Research and Development, 2021, 38(9): 105-113. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-GLJK202109014.htm
    [3] PANG J, KRATHAUS A, BENEDYK I, et al. A temporal instability analysis of environmental factors affecting accident occurrences during snow events: The random parameters hazard-based duration model with means and variances heterogeneity[J]. Analytic Methods in Accident Research, 2022, 34: 100215. doi: 10.1016/j.amar.2022.100215
    [4] 陈昭明, 徐文远. 基于负二项分布的高速公路交通事故影响因素分析[J]. 交通信息与安全, 2022, 40(1): 28-35. doi: 10.3963/j.jssn.1674-4861.2022.01.004

    CHEN Z M, XU W Y, An analysis of factors influencing freeway crashes with a negative binomial model[J]. Journal of Transport Information and Safety, 2022, 40(1): 28-35. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.01.004
    [5] 吴仁彪, 何宇翔, 贾云飞. 基于改进层次分析法的重点人员风险评价方法[J]. 中国安全科学学报, 2021, 31(10): 112-118. https://www.cnki.com.cn/Article/CJFDTOTAL-ZAQK202110020.htm

    WU R B, HE Y X, JIA Y F. Key personnel risk assessment method based on improved AHP[J]. China Safety Science Journal, 2021, 31(10): 112-118. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZAQK202110020.htm
    [6] AIASH A, ROBUSTE F. Traffic accident severity analysis in Barcelona using a binary probit and CHAID tree[J]. International Journal of Injury Control and Safety Promotion, 2022, 29(2): 256-264. doi: 10.1080/17457300.2021.1998136
    [7] LI W, ZHAO X, LIU S. Traffic accident prediction based on multivariable grey model[J]. Information, 2020, 11(4): 184-196. doi: 10.3390/info11040184
    [8] HUSSAIN F, LI Y, ARUN A, et al. A hybrid modelling framework of machine learning and extreme value theory for crash risk estimation using traffic conflicts[J]. Analytic Methods in Accident Research, 2022, 36: 2213-6657.
    [9] GILANI V N M, HOSSEINIAN S M, GHASEDI M, et al. Data-driven urban traffic accident analysis and prediction using logit and machine learning-based pattern recognition models[J]. Mathematical Problems in Engineering, 2021, 2021(10): 1-11.
    [10] 张洁, 张萌萌, 李虹燕. 基于二元Logistic模型的城市道路交通事故严重程度分析[J]. 交通信息与安全, 2022, 40 (5): 70-79. doi: 10.3963/j.jssn.1674-4861.2022.05.008

    ZHANG J, ZHANG M M, LI H Y, An analysis of severity of traffic accidents on urban roadways based on binary logistic models[J]. Journal of Transport Information and Safety, 2022, 40(5): 70-79. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.05.008
    [11] MOUSAVIAN ANARAKI S A, HAERI A, MOSLEHI F. A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability[J]. Pattern Analysis and Applications, 2021, 24(3): 1387-1402.
    [12] 张志恒, 李超. 基于PCA-BP神经网络的审计风险识别研究[J]. 重庆理工大学学报(自然科学), 2021, 35(5): 253-261. https://www.cnki.com.cn/Article/CJFDTOTAL-CGGL202105033.htm

    ZHANG Z H, LI C. The research of audit risk identification based on PCA and BP neural network[J]. Journal of Chongqing University of Technology(Natural Science), 2021, 35 (5): 253-261. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-CGGL202105033.htm
    [13] ISLAM M R, JENNY I J, NAYON M, et al. Clustering algorithms to analyze the road traffic crashes[C]. International Conference on Science & Contemporary Technologies(ICSCT), University of Da Nang, Vietnam: IEEE, 2021.
    [14] PEIXOTO M L M, MAIA A H O, MOTA E, et al. A traffic data clustering framework based on fog computing for VANETs[J]. Vehicular Communications, 2021, 2021(31): 100370.
    [15] BANDYOPADHYAY S, THAKUR S S, MANDAL J K. Product recommendation for e-commerce business by applying principal component analysis(PCA)and K-means clustering: benefit for the society[J]. Innovations in Systems and Software Engineering, 2020, 17(1): 45-52.
    [16] MAHA A R, LOAY E G. Stamps extraction using local adaptive K-means and ISODATA algorithms[J]. Indonesian Journal of Electrical Engineering and Computer Science, 2021, 21(1)137-145.
    [17] 赵莉, 候兴哲, 胡君, 等, 基于改进K-means算法的海量智能用电数据分析[J]. 电网技术, 2014, 38(10): 2715-2720. https://www.cnki.com.cn/Article/CJFDTOTAL-DWJS201410015.htm

    ZHAO L, HOU X Z, HU J, et al. Improved K-means algorithm based analysis on massive data of intelligent power utilization[J]. Power System Technology, 2014, 38(10): 2715-2720. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-DWJS201410015.htm
    [18] LU X, LONG J, WEN J, et al. Locality preserving projection with symmetric graph embedding for unsupervised dimensionality reduction[J/OL]. (2022-06)[2023-09-15]. https://doi.org/10.1016/j.patcog.2022.108844.
    [19] CHEN Y, XIAO X, HUA Z, et al. Adaptive transition probability matrix learning for multiview spectral clustering[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9): 4712-4726.
    [20] 李文杰, 闫世强, 蒋莹, 等. 自适应确定DBSCAN算法参数的算法研究[J]. 计算机工程与应用, 2019, 55(5): 1-7. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG201905002.htm

    LI W J, YAN S Q, JIANG Y, et al. Research on method of self-adaptive determination of DBSCAN algorithm parameters[J]. Computer Engineering and Applications, 2019, 55 (5): 1-7. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG201905002.htm
    [21] SALIH HASAN B M, ABDULAZEEZ A M. A Review of principal component analysis algorithm for dimensionality reduction[J]. Journal of Soft Computing and Data Mining, 2021, 2(1): 20-30.
  • 加载中
图(11) / 表(4)
计量
  • 文章访问数:  730
  • HTML全文浏览量:  407
  • PDF下载量:  48
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-01-09
  • 网络出版日期:  2023-11-23

目录

    /

    返回文章
    返回