A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network

LIAO Huimin; LUO Jingming; ZHANG Jinghui; LIU Wenping; DONG Wanqing; XIAO Hui; HUANG Jian

doi:10.3963/j.jssn.1674-4861.2024.06.010

Volume 42 Issue 6

Dec. 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Transport Information and Safety > 2024 > 42(6): 95-102

LIAO Huimin, LUO Jingming, ZHANG Jinghui, LIU Wenping, DONG Wanqing, XIAO Hui, HUANG Jian. A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network[J]. Journal of Transport Information and Safety, 2024, 42(6): 95-102. doi: 10.3963/j.jssn.1674-4861.2024.06.010

Citation:

LIAO Huimin, LUO Jingming, ZHANG Jinghui, LIU Wenping, DONG Wanqing, XIAO Hui, HUANG Jian. A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network[J]. Journal of Transport Information and Safety, 2024, 42(6): 95-102. doi: 10.3963/j.jssn.1674-4861.2024.06.010

Citation:

LIAO Huimin, LUO Jingming, ZHANG Jinghui, LIU Wenping, DONG Wanqing, XIAO Hui, HUANG Jian. A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network[J]. Journal of Transport Information and Safety, 2024, 42(6): 95-102. doi: 10.3963/j.jssn.1674-4861.2024.06.010

PDF( 1686 KB)

A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network

doi: 10.3963/j.jssn.1674-4861.2024.06.010

1.
Transportation Comprehensive Enforcement Corps, Beijing 100044, China
2.
School of software, Beihang University, Beijing 100089, China
3.
Zhonglu High tech Transportation Technology Group Co., Ltd, Beijing 100089, China

Received Date: 2023-12-24
Available Online: 2025-03-08

Abstract

Abstract

Traditional algorithms for identifying illegal passenger-carrying behavior, which rely on image processing techniques, utilize manually crafted human-vehicle interaction rules to discern boarding and alighting actions. However, these rule sets often fall short due to the intricate nature of traffic scenarios, resulting in suboptimal recognition performance. Therefore, a deep learning model based on a temporal pyramid network(TPN) is introduced for boarding and alighting action recognition. By training on a large dataset, more complete features of taxi passenger boarding and alighting behaviors are extracted to improve recognition accuracy. To address the issue of the TPN model not distinguishing between driver and passenger roles, the output layer is redesigned based on door area perception. This modification enhances the efficiency of multi-dimensional feature extraction. To tackle the issue of the large spatiotemporal span in boarding and alighting actions, which leads to interference from irrelevant movements, a sliding window mechanism is introduced. This mechanism, based on dynamic window weights, captures key video frames of the actions, enhancing recognition efficiency. Based on the above improvement measures, a boarding and alighting neural network(BANN) model, based on door area perception and dynamic weights, is proposed to efficiently and accurately recognize illegal passenger-carrying behaviors. A training dataset with 4, 047 annotated video clips and a test dataset with 810 unannotated video clips are constructed for model performance validation based on surveillance videos from Beijing Capital Airport. Experimental results demonstrate that the BANN model achieves precision and recall rates of 90.21% and 88.53%, respectively, representing improvements of 9.78% and 11.04% over the baseline TPN model. These results indicate that the BANN model can effectively meet the needs of traffic order supervision in transportation hubs.

FullText(HTML)

References(20)

References

[1]	寇敏, 张萌萌, 赵军学, 等. 道路交通安全风险辨识与分析方法综述[J]. 交通信息与安全, 2022, 40(6): 22-32. doi: 10.3963/j.jssn.1674-4861.2022.06.003 KOU M, ZHANG M M, ZHAO J X, et al. A Review of identification and analysis methods for road safety risk[J]. Journal of Transport Information and Safety, 2022, 40(6): 22-32. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.06.003
[2]	张博, 庞基敏, 章文嵩, 等. 互联网大数据技术在智慧交通发展中的应用[J]. 科技导报, 2020, 38(9): 47-54. ZHANG B, PANG J M, ZHANG W S, et al. Application of internet big data technology in the development of smart transportation[J]. Science & Technology Review, 2020, 38(9): 47-54. (in Chinese)
[3]	李熙莹, 陆强, 张晓春, 等. 基于人车交互行为模型的上下客行为识别[J]. 中国公路学报, 2021, 34(7): 152-163. doi: 10.3969/j.issn.1001-7372.2021.07.013 LI X Y, LU Q, ZHANG X C, et al. Boarding and alighting behavior recognition based on human-vehicle interaction behavior model[J]. China Journal of Highway and Transport, 2021, 34(7): 152-163. (in Chinese) doi: 10.3969/j.issn.1001-7372.2021.07.013
[4]	王隽. 基于机器视觉的高速公路服务区违法上下客识别应用研究[J]. 时代汽车, 2022(14): 196-198 doi: 10.3969/j.issn.1672-9668.2022.14.069 WANG J. Application research of illegal boarding and alighting recognition in expressway service area based on machine Vision[J]. Auto Time, 2022(14): 196-198. (in Chinese) doi: 10.3969/j.issn.1672-9668.2022.14.069
[5]	贺艺斌, 田圣哲, 兰贵龙. 基于改进Faster-RCNN算法的行人检测[J]. 汽车实用技术, 2022, 47 (05): 34-37. HE Y B, TIAN S Z, LAN G L. Pedestrian detection based on improved faster-RCNN algorithm[J]. Automobile Applied Technology, 2022, 47(05): 34-37. (in Chinese)
[6]	张若杨, 贾克斌, 刘鹏宇. 视频监控中私自揽客违法行为检测[J]. 计算机应用与软件, 2019, 36 (3): 168-173, 209. doi: 10.3969/j.issn.1000-386x.2019.03.031 ZHANG R Y, JIA K B, LIU P Y. Illegal behavior detection of carrying passengers privately in video surveillance[J]. Computer Applications and Software, 2019, 36(03): 168-173, 209. (in Chinese) doi: 10.3969/j.issn.1000-386x.2019.03.031
[7]	房春瑶, 贾克斌, 刘鹏宇. 基于监控视频的出租车违规私揽行为识别[J]. 计算机仿真, 2020, 37 (5): 326-331. doi: 10.3969/j.issn.1006-9348.2020.05.066 FANG C Y, JIA K B, LIU P Y. Identification of taxi violation behavior based on surveillance video[J]. Computer Simulation, 2020, 37 (5): 326-331. (in Chinese) doi: 10.3969/j.issn.1006-9348.2020.05.066
[8]	JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1): 221-231.
[9]	TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. International Conference on Computer Vision, Boston, USA: IEEE, 2015.
[10]	CARREIRA J, ZISSERMAN A. QUO VADIS, Action recognition? a new model and the kinetics dataset[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE/CVF, 2017.
[11]	TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE/CVF, 2018.
[12]	HUANG D A, RAMANATHAN V, MAHAJAN D, et al. What makes a video a video: analyzing temporal information in video understanding models and datasets[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE/CVF, 2018.
[13]	FEICHTENHOFER C, FAN H, MALIK J, et al. Slowfast networks for video recognition[C]. International Conference on Computer Vision, Seoul, Korea (South): IEEE/CVF, 2019.
[14]	YANG C, XU Y, SHI J, et al. Temporal pyramid network for action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA: IEEE, 2020.
[15]	HAN K, XIAO A, WU E, et al. Transformer in transformer[J]. Advances in neural information processing systems, 2021, 34: 15908-15919.
[16]	HAN K, WANG Y, CHEN H, et al. A survey on vision transformer[J]. IEEE Transactions on Pattern Analysis and Machine intelligence, 2022, 45(1): 87-110.
[17]	BERTASIAS G, WANG H, TORRESANI L. Is space-time attention all you need for video understanding?[C]. International Conference on Machine Learning, Vienna, Austria: IMLS, 2021.
[18]	杨世强, 罗晓宇, 乔丹, 等. 基于滑动窗口和动态规划的连续动作分割与识别[J]. 计算机应用, 2019, 39(2): 348-353. YANG S Q, LUO X Y, QIAO D et al. Continuous action segmentation and recognition based on sliding window and dynamic programming[J]. Journal of Computer Applications, 2019, 39(2): 348-353. (in Chinese)
[19]	HARA K, KATAOKA H, SATOH Y. Learning spatio-temporal features with 3D residual networks for action recognition[C]. International Conference on Computer Vision Workshops, Lido Island, Venice, Italy: IEEE, 2017.
[20]	ZHANGE D, ZHANG H, TANG J, et al. Feature pyramid transformer[C]. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK: European Computer Vision Association, 2020.

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(10) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (82) PDF downloads(10)

A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network

doi: 10.3963/j.jssn.1674-4861.2024.06.010

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network

doi: 10.3963/j.jssn.1674-4861.2024.06.010

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content