基于改进时间金字塔网络的出租车乘客上下车动作识别模型

廖惠敏; 罗静茗; 张璟辉; 刘文平; 董婉青; 肖晖; 黄坚

doi:10.3963/j.jssn.1674-4861.2024.06.010

基于改进时间金字塔网络的出租车乘客上下车动作识别模型

doi: 10.3963/j.jssn.1674-4861.2024.06.010

1.
北京市交通运输综合执法总队北京 100044
2.
北京航空航天大学软件学院北京 100089
3.
中路高科交通科技集团有限公司北京 100089

基金项目:

国家重点研发计划项目 2022YFB2602104

北京市交通行业科技项目 0686-2241B1251414Z

车路一体智能交通全国重点实验室自主研究项目 2021-Z011

详细信息

作者简介:
廖惠敏(1981—)，硕士研究生. 研究方向：智慧交通、大数据应用. E-mail: liaohuimin@jtw.beijing.gov.cn

通讯作者:
黄坚(1975—)，博士，副教授. 研究方向：智能交通、人工智能等. E-mail: hj@buaa.edu.cn

中图分类号: U495
计量
- 文章访问数: 82
- HTML全文浏览量: 24
- PDF下载量: 10
- 被引次数: 0
出版历程
- 收稿日期: 2023-12-24
- 网络出版日期: 2025-03-08

A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network

1.
Transportation Comprehensive Enforcement Corps, Beijing 100044, China
2.
School of software, Beihang University, Beijing 100089, China
3.
Zhonglu High tech Transportation Technology Group Co., Ltd, Beijing 100089, China

摘要

摘要: 传统基于图像处理的违法载客识别算法依赖人工制定的人车交互规则以确定上下车行为的发生。然而，由于交通场景的复杂性，人工制定的规则集不够完善，导致算法识别效果较差。因此，引入基于时间金字塔网络(temporal pyramid network，TPN) 的深度学习模型进行上下车动作识别，通过大量样本集的训练提取较为完备的出租车乘客上下车行为特征，提升识别准确性。针对TPN模型无法区别司乘角色身份的问题，重新设计基于车门区域感知的模型输出层，增强模型多维度特征提取效率；针对上下车行为时空跨度大，模型易受无关动作干扰问题，加入一种基于动态窗口权重的滑窗机制，捕捉动作关键视频帧，提高识别效率。综合上述改进措施，提出了基于车门区域感知和动态权重的出租车乘客上下车动作识别模型(boarding and alighting neural network，BANN)，实现高效准确的违法载客行为识别。基于首都机场监控视频构建包含4 047段带标注视频的训练集和810段未标注视频的测试集对模型进行验证。实验结果表明：BANN模型的查准率和查全率分别达到90.21%和88.53%，较基准TPN模型分别提升了9.78%和11.04%，能够较好满足枢纽场站交通秩序监管的需要。
- 智能交通 /
- 上下车动作识别 /
- 乘客上下车动作识别网络 /
- 时间金字塔网络 /
- 违法载客 /
- 深度学习
Abstract: Traditional algorithms for identifying illegal passenger-carrying behavior, which rely on image processing techniques, utilize manually crafted human-vehicle interaction rules to discern boarding and alighting actions. However, these rule sets often fall short due to the intricate nature of traffic scenarios, resulting in suboptimal recognition performance. Therefore, a deep learning model based on a temporal pyramid network(TPN) is introduced for boarding and alighting action recognition. By training on a large dataset, more complete features of taxi passenger boarding and alighting behaviors are extracted to improve recognition accuracy. To address the issue of the TPN model not distinguishing between driver and passenger roles, the output layer is redesigned based on door area perception. This modification enhances the efficiency of multi-dimensional feature extraction. To tackle the issue of the large spatiotemporal span in boarding and alighting actions, which leads to interference from irrelevant movements, a sliding window mechanism is introduced. This mechanism, based on dynamic window weights, captures key video frames of the actions, enhancing recognition efficiency. Based on the above improvement measures, a boarding and alighting neural network(BANN) model, based on door area perception and dynamic weights, is proposed to efficiently and accurately recognize illegal passenger-carrying behaviors. A training dataset with 4, 047 annotated video clips and a test dataset with 810 unannotated video clips are constructed for model performance validation based on surveillance videos from Beijing Capital Airport. Experimental results demonstrate that the BANN model achieves precision and recall rates of 90.21% and 88.53%, respectively, representing improvements of 9.78% and 11.04% over the baseline TPN model. These results indicate that the BANN model can effectively meet the needs of traffic order supervision in transportation hubs.
- intelligent transportation /
- boarding and alighting action recognition /
- passenger boarding and alighting action recognition network /
- temporal pyramid network /
- illegal passenger carrying /
- deep learning

HTML全文

图 1 乘客与司机上下车动作

Figure 1. The boarding and alighting of passengers and drivers

下载: 全尺寸图片幻灯片

图 2 长持续时间的乘客上下车动作

Figure 2. The prolonged act of passenger' s boarding action

下载: 全尺寸图片幻灯片

图 3 数据标注示意图

Figure 3. Data annotation diagram

下载: 全尺寸图片幻灯片

图 4 TPN架构

Figure 4. TPN architecture

下载: 全尺寸图片幻灯片

图 5 BANN网络结构

Figure 5. The structure of BANN

下载: 全尺寸图片幻灯片

图 6 上下车动作识别网络输出部分结构

Figure 6. The output part structure of the Boarding and AlightingNeural Network

下载: 全尺寸图片幻灯片

图 7 动态滑窗模块流程图

Figure 7. Dynamic sliding window module flowchart

下载: 全尺寸图片幻灯片

图 8 滑窗模块实验结果图

Figure 8. Experimental results of sliding window module

下载: 全尺寸图片幻灯片

图 9 BANN检测结果图

Figure 9. BANN detection result graph

下载: 全尺寸图片幻灯片

图 10 复杂环境违法载客事件识别

Figure 10. Identification of illegal passenger carrying incidents in complex environments

下载: 全尺寸图片幻灯片

表 1 乘客上下车动作正负样本划分表

Table 1. The positive and negative sample division table of passengers' boarding and alighting

正样本	乘客上下车
负样本（部分）	司机开、关车门
	司机下车后短暂活动又上车
	司机单独上车后驶离
	行人从车旁路过
	乘客与司机交谈后步行离开
	乘客上车后又下车
	乘客下车后返身取行李等

下载: 导出CSV

表 2 实验训练参数表

Table 2. Experimental training parameter table

参数	取值
迭代次数	1 000
初始学习率	0.000 3
动量	0.99
权重衰减	0.000 1
学习率调整策略	余弦退火策略
批大小	2

下载: 导出CSV

表 3 数据集样本分布数量表

Table 3. Dataset sample distribution quantity table

样本类别	数据量
总数据量	4 047
负样本	1 116
上车数据量	1 437
下车数据量	1 512
司机上下车数据量	1 213
乘客上下车数据量	2 298

下载: 导出CSV

表 4 现有动作识别方法的实验结果

Table 4. Experimental results of existing motion recognition methods

网络模型	查准率/%	查全率/%
C3D	85.89	84.39
slowfast	88.90	87.39
TimeSformer	90.21	89.25
TPN基准(无车门损失函数)	90.39	89.91
BANN	95.47	93.89

下载: 导出CSV

表 5 司乘上下车动作测试结果

Table 5. Test results of driver's boarding and alighting movements

模型	查准率/%	查全率/%	TP	FP	FN
C3D	61.14	69.88	225	143	97
slowfast	64.40	75.24	237	131	78
TimeSformer	66.30	75.08	244	124	81
TPN	80.43	77.49	296	72	86
BANN	90.21	88.53	332	36	43

下载: 导出CSV

参考文献(20)

[1]	寇敏, 张萌萌, 赵军学, 等. 道路交通安全风险辨识与分析方法综述[J]. 交通信息与安全, 2022, 40(6): 22-32. doi: 10.3963/j.jssn.1674-4861.2022.06.003 KOU M, ZHANG M M, ZHAO J X, et al. A Review of identification and analysis methods for road safety risk[J]. Journal of Transport Information and Safety, 2022, 40(6): 22-32. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.06.003
[2]	张博, 庞基敏, 章文嵩, 等. 互联网大数据技术在智慧交通发展中的应用[J]. 科技导报, 2020, 38(9): 47-54. ZHANG B, PANG J M, ZHANG W S, et al. Application of internet big data technology in the development of smart transportation[J]. Science & Technology Review, 2020, 38(9): 47-54. (in Chinese)
[3]	李熙莹, 陆强, 张晓春, 等. 基于人车交互行为模型的上下客行为识别[J]. 中国公路学报, 2021, 34(7): 152-163. doi: 10.3969/j.issn.1001-7372.2021.07.013 LI X Y, LU Q, ZHANG X C, et al. Boarding and alighting behavior recognition based on human-vehicle interaction behavior model[J]. China Journal of Highway and Transport, 2021, 34(7): 152-163. (in Chinese) doi: 10.3969/j.issn.1001-7372.2021.07.013
[4]	王隽. 基于机器视觉的高速公路服务区违法上下客识别应用研究[J]. 时代汽车, 2022(14): 196-198 doi: 10.3969/j.issn.1672-9668.2022.14.069 WANG J. Application research of illegal boarding and alighting recognition in expressway service area based on machine Vision[J]. Auto Time, 2022(14): 196-198. (in Chinese) doi: 10.3969/j.issn.1672-9668.2022.14.069
[5]	贺艺斌, 田圣哲, 兰贵龙. 基于改进Faster-RCNN算法的行人检测[J]. 汽车实用技术, 2022, 47 (05): 34-37. HE Y B, TIAN S Z, LAN G L. Pedestrian detection based on improved faster-RCNN algorithm[J]. Automobile Applied Technology, 2022, 47(05): 34-37. (in Chinese)
[6]	张若杨, 贾克斌, 刘鹏宇. 视频监控中私自揽客违法行为检测[J]. 计算机应用与软件, 2019, 36 (3): 168-173, 209. doi: 10.3969/j.issn.1000-386x.2019.03.031 ZHANG R Y, JIA K B, LIU P Y. Illegal behavior detection of carrying passengers privately in video surveillance[J]. Computer Applications and Software, 2019, 36(03): 168-173, 209. (in Chinese) doi: 10.3969/j.issn.1000-386x.2019.03.031
[7]	房春瑶, 贾克斌, 刘鹏宇. 基于监控视频的出租车违规私揽行为识别[J]. 计算机仿真, 2020, 37 (5): 326-331. doi: 10.3969/j.issn.1006-9348.2020.05.066 FANG C Y, JIA K B, LIU P Y. Identification of taxi violation behavior based on surveillance video[J]. Computer Simulation, 2020, 37 (5): 326-331. (in Chinese) doi: 10.3969/j.issn.1006-9348.2020.05.066
[8]	JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1): 221-231.
[9]	TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. International Conference on Computer Vision, Boston, USA: IEEE, 2015.
[10]	CARREIRA J, ZISSERMAN A. QUO VADIS, Action recognition? a new model and the kinetics dataset[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE/CVF, 2017.
[11]	TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE/CVF, 2018.
[12]	HUANG D A, RAMANATHAN V, MAHAJAN D, et al. What makes a video a video: analyzing temporal information in video understanding models and datasets[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE/CVF, 2018.
[13]	FEICHTENHOFER C, FAN H, MALIK J, et al. Slowfast networks for video recognition[C]. International Conference on Computer Vision, Seoul, Korea (South): IEEE/CVF, 2019.
[14]	YANG C, XU Y, SHI J, et al. Temporal pyramid network for action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA: IEEE, 2020.
[15]	HAN K, XIAO A, WU E, et al. Transformer in transformer[J]. Advances in neural information processing systems, 2021, 34: 15908-15919.
[16]	HAN K, WANG Y, CHEN H, et al. A survey on vision transformer[J]. IEEE Transactions on Pattern Analysis and Machine intelligence, 2022, 45(1): 87-110.
[17]	BERTASIAS G, WANG H, TORRESANI L. Is space-time attention all you need for video understanding?[C]. International Conference on Machine Learning, Vienna, Austria: IMLS, 2021.
[18]	杨世强, 罗晓宇, 乔丹, 等. 基于滑动窗口和动态规划的连续动作分割与识别[J]. 计算机应用, 2019, 39(2): 348-353. YANG S Q, LUO X Y, QIAO D et al. Continuous action segmentation and recognition based on sliding window and dynamic programming[J]. Journal of Computer Applications, 2019, 39(2): 348-353. (in Chinese)
[19]	HARA K, KATAOKA H, SATOH Y. Learning spatio-temporal features with 3D residual networks for action recognition[C]. International Conference on Computer Vision Workshops, Lido Island, Venice, Italy: IEEE, 2017.
[20]	ZHANGE D, ZHANG H, TANG J, et al. Feature pyramid transformer[C]. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK: European Computer Vision Association, 2020.