面向不平衡数据的SMOTE-LSTM车辆事故检测方法

王天硕; 高景伯; 童盛军; 李振龙; 赵晓华

doi:10.3963/j.jssn.1674-4861.2025.01.005

面向不平衡数据的SMOTE-LSTM车辆事故检测方法

doi: 10.3963/j.jssn.1674-4861.2025.01.005

1.
北京工业大学城市交通学院北京 100124
2.
北京车网科技发展有限公司北京 100176

详细信息

作者简介:
王天硕（2001—），硕士研究生. 研究方向：交通安全. E-mail：wts159753@126.com

通讯作者:
李振龙（1976—），博士，教授. 研究方向：交通控制、驾驶行为等. E-mail: lzl@bjut.edu.cn

中图分类号: U491.31
计量
- 文章访问数: 15
- HTML全文浏览量: 8
- PDF下载量: 2
- 被引次数: 0
出版历程
- 收稿日期: 2024-05-07
- 网络出版日期: 2025-06-27

SMOTE-LSTM Vehicle Accident Detection Method for Imbalanced Data

1.
School of Urban Transportation, Beijing University of Technology, Beijing 100124, China
2.
Beijing Connected and Autonomous Vehicles Technology Co., Ltd, Beijing 100176, China

摘要

摘要: 在车辆事故检测中，由于事故车辆相比于正常车辆数量较少，将导致数据不平衡，从而使得事故车辆无法被正确识别，容易将其误判为正常车辆。因此，研究了1种基于SMOTE-LSTM的车辆事故检测算法。针对事故数据与正常数据不平衡问题，采用合成少数类过采样技术（synthetic minority over-sampling technique，SMOTE），在事故类样本点之间随机插入样本、增加其数量，实现事故与正常2类样本的数据平衡。同时，在对事故数据进行过采样时，通过对比不同邻居数下的检测精度，选择了最优的邻居数，以提高事故类样本识别率并避免过多噪声干扰。在此基础上采用长短期记忆网络（long short term memory，LSTM）精准捕获车辆发生事故时的数据时序特征，并通过引入Dropout层有效降低过拟合，提升了模型的泛化能力，准确实现车辆事故检测。此外，为了减少事故车辆被误检为正常车辆的情况，在模型损失函数中引入了类别权重，通过调整权重使模型更关注对事故类样本的检测。最后，在采集的车辆行驶状态时序数据集上进行6组对比实验。其中，前3组实验未采用基于SMOTE-LSTM的算法，在增加正常样本的基础上进行类别平衡、轻微和中等类别不平衡的车辆事故检测。后3组实验采用了基于SMOTE-LSTM的算法，涉及轻微、中等和极度类别不平衡情况。实验结果表明：当使用本文方法进行车辆事故检测时，Precision、Recall、F1值、G-mean，以及AUC值均取得了显著的提升，其中在轻微类别不平衡情况下，这5个评价指标值分别提高了56.2%、2.5%、38.7%、5.8%和5.4%。在中等类别不平衡情况下，分别提高了75%、14.1%、59%、8.2%和7.8%。结果表明，本文所提算法在处理车辆事故检测中的类别不平衡问题时，能够显著提高各项评价指标，尤其在轻微和中等类别不平衡的情况下，算法有效提升了对少数类的识别能力，展现了较强的鲁棒性和更好的分类性能。
- 交通安全 /
- 不平衡数据 /
- 车辆事故检测 /
- 过采样技术 /
- 长短期记忆网络
Abstract: In vehicle accident detection, the imbalance between the small number of accident vehicles and the large number of normal vehicles can lead to difficulties in accurately identifying accident vehicles, increasing the risk of misclassifying them as normal vehicles. Therefore, a vehicle accident detection algorithm based on SMOTE-LSTM is proposed. To address the data imbalance between accident and normal samples, the synthetic minority over-sampling technique (SMOTE) is employed to randomly insert samples between accident data points, increasing their quantity and achieving data balance between the two categories. Furthermore, when oversampling accident data, the optimal number of neighbors is selected by comparing the detection accuracy under different neighbor counts to improve the recognition rate of accident samples while minimizing noise interference. On this basis, long short-term memory (LSTM) networks are employed to accurately capture the temporal features of data when vehicle accidents occur. Additionally, a Dropout layer is introduced to reduce overfitting and enhance the model's generalization ability, ensuring accurate accident detection. To minimize the misclassification of accident vehicles as normal, class weights are incorporated into the loss function, adjusting the weights to make the model more focused on accident sample detection. Finally, six groups of comparative experiments were conducted on a collected vehicle driving state time-series dataset. The first three groups did not use the SMOTE-LSTM-based algorithm, performing vehicle accident detection under balanced, mildly imbalanced, and moderately imbalanced conditions by increasing the number of normal samples. The latter three groups employ the SMOTE-LSTM-based algorithm to address mild, moderate, and severely imbalanced conditions. Experimental results show that, with the proposed method, the values of Precision, Recall, F1-score, G-mean, and AUC are significantly improved. Specifically, under mildly class imbalance, these five evaluation metrics increase by 56.2%, 2.5%, 38.7%, 5.8%, and 5.4%, respectively. Under moderate class imbalance, the improvements are 75%, 14.1%, 59%, 8.2%, and 7.8%. The results demonstrate that the proposed algorithm effectively addresses the class imbalance issue in vehicle accident detection, significantly enhancing all evaluation metrics. Particularly in mildly and moderately imbalanced scenarios, the algorithm effectively enhances the recognition ability of the minority class, exhibiting strong robustness and better classification performance.
- traffic safety /
- imbalanced data /
- vehicle accident detection /
- oversampling technique /
- LSTM

HTML全文

图 1 结合SMOTE采样的LSTM车辆事故检测模型

Figure 1. LSTM vehicle accident detection model with SMOTE sampling

下载: 全尺寸图片幻灯片

图 2 LSTM单元结构

Figure 2. LSTM unit structure

下载: 全尺寸图片幻灯片

图 3 Dropout应用前后对比图

Figure 3. Comparison chart before and after applying Dropout

下载: 全尺寸图片幻灯片

图 4 模型训练流程

Figure 4. Model training process

下载: 全尺寸图片幻灯片

图 5 评价指标可视化图

Figure 5. Visualization of evaluation indicators

下载: 全尺寸图片幻灯片

表 1 处理后部分数据

Table 1. After processing some data

速度	纵向加速度	横向加速度	油门踏板开度	制动踏板开度	车辆标号
0.0	-0.02	0.00	0	22	9
0.0	-0.09	-0.03	0	23	9
0.0	-0.01	-0.02	0	23	9
0.0	0.00	0.05	0	23	9
0.0	-0.03	-0.04	0	23	9
0.0	-0.08	0.01	0	23	9
0.0	0.00	0.00	0	23	9
0.0	-0.02	-0.01	0	23	9
0.0	-0.04	-0.02	0	23	9

下载: 导出CSV

表 2 各组样本数量

Table 2. Sample size of each group

组号	原始正常车辆数	原始事故车辆数	训练集样本数	是否过采样
1	78	78	125	否
2	468	78	437	否
3	4 346	78	3 539	否
4	468	78	745	是
5	4 346	78	6 954	是
6	98 015	78	156 824	是

下载: 导出CSV

表 3 LSTM模型超参数

Table 3. LSTM model hyperparameters

组号	层数	节点数	训练次数	批量大小	学习率	丢弃率
1	2	16	500	64	0.001	0.2
2	3	18	500	64	0.001	0.2
3	3	20	500	64	0.001	0.2
4	4	22	500	64	0.001	0.5
5	4	22	500	64	0.001	0.5
6	6	23	500	64	0.001	0.5

下载: 导出CSV

表 4 评价指标值

Table 4. Evaluation index values

组号	Precision	Recall	F1	G-mean	AUC
1	0.688	0.786	0.730	0.753	0.754
2	0.375	0.857	0.522	0.879	0.880
3	0.250	0.800	0.380	0.888	0.893
4	0.937	0.882	0.909	0.937	0.934
5	1	0.941	0.970	0.970	0.971
6	1	0.276	0.433	0.535	0.638

下载: 导出CSV

参考文献(26)

[1]	KOVÁCS G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets[J]. Applied Soft Computing, 2019, 83: 105662-105662. doi: 10.1016/j.asoc.2019.105662
[2]	YU H, NI J, ZHAO J. ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data[J]. Neurocomputing, 2013, 101(2): 309-318.
[3]	ZHU Z, WANG Z, LI D, et al. Geometric structural ensemble learning for imbalanced problems[J]. IEEE Transactions on Cybernetics, 2018, 48(1): 1-13. doi: 10.1109/TCYB.2017.2771059
[4]	YU H, SUN C, YANG X, et al. Fuzzy support vector machine with relative density information for classifying imbalanced data[J]. IEEE Transactions on Fuzzy Systems, 2019, 27(12): 2353-2367. doi: 10.1109/TFUZZ.2019.2898371
[5]	罗秀玲, 陈欢, 冯川. 基于RF-fpgrowth算法的道路营运车辆交通事故特征挖掘[J]. 交通科技与管理, 2024, 5(24): 26-29. LUO X L, CHEN H, FENG C. Traffic accident feature mining of road operation vehicles based on the RF-fpgrowth algorithm[J]. Traffic Technology and Management, 2024, 5(24): 26-29(. in Chinese)
[6]	LEE I J. An accident detection system on highway using vehicle tracking trace[C]. International Conference on ICT Convergence, Jeju, South Korea: IEEE, 2011.
[7]	刘倩, 王雪松. 交叉口自动驾驶车辆事故前场景生成与致因分析[J]. 中国公路学报, 2024, 37(4): 297-309. LIU Q, WANG X S. Generation of pre-accident scenarios and causal analysis for autonomous vehicles at intersections[J]. Journal of China Highway and Transport, 2024, 37(4): 297-309(. in Chinese)
[8]	OZBAYOGLU M, KUCUKAYAN G, DOGDU E. A real - time autonomous highway accident detection model based on big data processing and computational intelligence[C]. IEEE International Conference on Big Data, Washington, D.C. IEEE, 2017.
[9]	陈俊宇, 李金龙, 许伦辉, 等. 基于ADASYN-XGBoost的交通事故自动检测方法[J]. 交通信息与安全, 2023, 41(3): 12-22. doi: 10.3963/j.jssn.1674-4861.2023.03.002 CHEN J Y, LI J L, XU L H, et al. Automatic traffic accident detection method based on ADASYN-XGBoost[J]. Journal of Transport Information and Safety, 2023, 41(3): 12-22. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2023.03.002
[10]	王晨, 周威, 严隽逸, 等. 一种用于道路交通事故自动检测的改进双流网络[J]. 中国公路学报, 2023, 36(5): 185-196. WANG C, ZHOU W, YAN J Y, et al. An improved dual-stream network for automatic traffic accident detection[J]. Journal of China Highway and Transport, 2023, 36(5): 185-196(. in Chinese)
[11]	ZUALKERNAN I A, ALOUL F A, BASHEER F, et al. Intelligent accident detection classification using mobile phones[C]. 2018 International Conference on Information Networking(ICOIN), Chiang Mai: IEEE, 2018.
[12]	ALI H M, ALWAN Z S. Car accident detection and notification system using smartphone[M]. Saarbrucken: Lap Lambert Academic Publishing, 2017.
[13]	DOGRU N. SUBASI A. Traffic accident detection using ran-dom forest classifier[C]. 2018 15^th Learning and Technology Conference, Jeddah: IEEE, 2018.
[14]	FERNANDES B, ALAM M, GOMES V, et al. Automatic accident detection with multi - modal alert system implementation for ITS[J]. Vehicular Communications, 2016(3): 1-11.
[15]	KIMACHI M, KANAYAMA K, TERAMOTO K. Incident prediction by fuzzy image sequence analysis[C]. Vehicle Navigation and Information Systems Conference, Boston: IEEE, 1994.
[16]	KAMIJO S K, MATSUSHITA Y, IKEUCHI K, et al. Traffic monitoring and accident detection at intersections[J]. IEEE Transactions on Intelligent Transportation Systems, 2000, 1(2): 108-118. doi: 10.1109/6979.880968
[17]	PICIARELLI C, MICHELONI C, FORESTI G L. Trajectory-based anomalous event detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2008, 18(11): 1544-1554. doi: 10.1109/TCSVT.2008.2005599
[18]	CHAKRABORTY P, SHARMA A, HEGDE C. Freeway traffic incident detection from cameras: a semi - supervised learning approach[C]. 2018 21^st International Conference on Intelligent Transportation Systems (ITSC), Maui: IEEE, 2018.
[19]	CHONG Y S, TAY Y H. Abnormal event detection in videos using spatiotemporal autoencoder[C]. 14^th International Symposium on Neural Networks, Japan: IEEE, 2017.
[20]	阎莹, 王玉莹, 周旋, 等. 基于混合模型的自动驾驶车辆事故严重程度影响因素分析[J/OL]. 交通运输工程学报, 1-16[2025-02-20]. http://kns.cnki.net/kcms/detail/61.1369.U.20250108.1537.002.html. YAN Y, WANG Y Y, ZHOU X, et al. Analysis of factors influencing accident severity for autonomous vehicles based on a mixed model[J/OL]. Journal of Traffic and Transportation Engineering, 1-16 [2025-02-20]. http://kns.cnki.net/kc-ms/detail/61.1369.U.20250108.1537.002.html(. in Chinese)
[21]	OQUAB M, BOTTOU L, LAPTEV I, et al. Learning and transferring mid-level image representations using convolutional neural networks[C]. 27^th IEEE Conference on Computer Vision and Pattern Recognition, Columbus: IEEE, 2014.
[22]	方昊, 李云. 基于多次随机欠采样和POSS方法的软件缺陷检测[J]. 山东大学学报: 工学版, 2017, 47(1): 15-21. FANG H, LI Y. Software defect detection based on multiple random undersampling and POSS method[J]. Journal of Shandong University: Engineering and Technology Edition, 2017, 47(1): 15-21. (in Chinese)
[23]	MOREO A, ESULI A, SEBASTIANI F. Distributional random oversampling for imbalanced text classification[C]. 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa: ACM, 2016.
[24]	章缙, 李洪赭, 李赛飞. 针对基于随机森林的网络入侵检测模型的优化研究[J]. 计算机与数字工程, 2022, 50(1): 106-110. ZHANG J, LI H Z, LI S F. Research on optimization of network intrusion detection model based on random forest[J]. Computer and Digital Engineering, 2022, 50(1): 106-110. (in Chinese)
[25]	LUCA S, CLIFTON D A, VANRUMSTE B. One-class classification of point patterns of extremes[J]. Journal of Machine Learning Research, 2016, 17(1): 6581-6601.
[26]	陈志, 郭武. 不平衡训练数据下的基于深度学习的文本分类[J]. 小型微型计算机系统, 2020, 41(1): 1-5. CHEN Z, GUO W. Text classification based on deep learning with unbalanced training data[J]. Journal of Computer Applications, 2020, 41(1): 1-5. (in Chinese)