留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于改进时间金字塔网络的出租车乘客上下车动作识别模型

廖惠敏 罗静茗 张璟辉 刘文平 董婉青 肖晖 黄坚

廖惠敏, 罗静茗, 张璟辉, 刘文平, 董婉青, 肖晖, 黄坚. 基于改进时间金字塔网络的出租车乘客上下车动作识别模型[J]. 交通信息与安全, 2024, 42(6): 95-102. doi: 10.3963/j.jssn.1674-4861.2024.06.010
引用本文: 廖惠敏, 罗静茗, 张璟辉, 刘文平, 董婉青, 肖晖, 黄坚. 基于改进时间金字塔网络的出租车乘客上下车动作识别模型[J]. 交通信息与安全, 2024, 42(6): 95-102. doi: 10.3963/j.jssn.1674-4861.2024.06.010
LIAO Huimin, LUO Jingming, ZHANG Jinghui, LIU Wenping, DONG Wanqing, XIAO Hui, HUANG Jian. A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network[J]. Journal of Transport Information and Safety, 2024, 42(6): 95-102. doi: 10.3963/j.jssn.1674-4861.2024.06.010
Citation: LIAO Huimin, LUO Jingming, ZHANG Jinghui, LIU Wenping, DONG Wanqing, XIAO Hui, HUANG Jian. A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network[J]. Journal of Transport Information and Safety, 2024, 42(6): 95-102. doi: 10.3963/j.jssn.1674-4861.2024.06.010

基于改进时间金字塔网络的出租车乘客上下车动作识别模型

doi: 10.3963/j.jssn.1674-4861.2024.06.010
基金项目: 

国家重点研发计划项目 2022YFB2602104

北京市交通行业科技项目 0686-2241B1251414Z

车路一体智能交通全国重点实验室自主研究项目 2021-Z011

详细信息
    作者简介:

    廖惠敏(1981—),硕士研究生. 研究方向:智慧交通、大数据应用. E-mail: liaohuimin@jtw.beijing.gov.cn

    通讯作者:

    黄坚(1975—),博士,副教授. 研究方向:智能交通、人工智能等. E-mail: hj@buaa.edu.cn

  • 中图分类号: U495

A Recognition Model for Passenger Boarding and Alighting Action Based on Improved Temporal Pyramid Network

  • 摘要: 传统基于图像处理的违法载客识别算法依赖人工制定的人车交互规则以确定上下车行为的发生。然而,由于交通场景的复杂性,人工制定的规则集不够完善,导致算法识别效果较差。因此,引入基于时间金字塔网络(temporal pyramid network,TPN) 的深度学习模型进行上下车动作识别,通过大量样本集的训练提取较为完备的出租车乘客上下车行为特征,提升识别准确性。针对TPN模型无法区别司乘角色身份的问题,重新设计基于车门区域感知的模型输出层,增强模型多维度特征提取效率;针对上下车行为时空跨度大,模型易受无关动作干扰问题,加入一种基于动态窗口权重的滑窗机制,捕捉动作关键视频帧,提高识别效率。综合上述改进措施,提出了基于车门区域感知和动态权重的出租车乘客上下车动作识别模型(boarding and alighting neural network,BANN),实现高效准确的违法载客行为识别。基于首都机场监控视频构建包含4 047段带标注视频的训练集和810段未标注视频的测试集对模型进行验证。实验结果表明:BANN模型的查准率和查全率分别达到90.21%和88.53%,较基准TPN模型分别提升了9.78%和11.04%,能够较好满足枢纽场站交通秩序监管的需要。

     

  • 图  1  乘客与司机上下车动作

    Figure  1.  The boarding and alighting of passengers and drivers

    图  2  长持续时间的乘客上下车动作

    Figure  2.  The prolonged act of passenger' s boarding action

    图  3  数据标注示意图

    Figure  3.  Data annotation diagram

    图  4  TPN架构

    Figure  4.  TPN architecture

    图  5  BANN网络结构

    Figure  5.  The structure of BANN

    图  6  上下车动作识别网络输出部分结构

    Figure  6.  The output part structure of the Boarding and AlightingNeural Network

    图  7  动态滑窗模块流程图

    Figure  7.  Dynamic sliding window module flowchart

    图  8  滑窗模块实验结果图

    Figure  8.  Experimental results of sliding window module

    图  9  BANN检测结果图

    Figure  9.  BANN detection result graph

    图  10  复杂环境违法载客事件识别

    Figure  10.  Identification of illegal passenger carrying incidents in complex environments

    表  1  乘客上下车动作正负样本划分表

    Table  1.   The positive and negative sample division table of passengers' boarding and alighting

    正样本 乘客上下车
    负样本(部分) 司机开、关车门
    司机下车后短暂活动又上车
    司机单独上车后驶离
    行人从车旁路过
    乘客与司机交谈后步行离开
    乘客上车后又下车
    乘客下车后返身取行李等
    下载: 导出CSV

    表  2  实验训练参数表

    Table  2.   Experimental training parameter table

    参数 取值
    迭代次数 1 000
    初始学习率 0.000 3
    动量 0.99
    权重衰减 0.000 1
    学习率调整策略 余弦退火策略
    批大小 2
    下载: 导出CSV

    表  3  数据集样本分布数量表

    Table  3.   Dataset sample distribution quantity table

    样本类别 数据量
    总数据量 4 047
    负样本 1 116
    上车数据量 1 437
    下车数据量 1 512
    司机上下车数据量 1 213
    乘客上下车数据量 2 298
    下载: 导出CSV

    表  4  现有动作识别方法的实验结果

    Table  4.   Experimental results of existing motion recognition methods

    网络模型 查准率/% 查全率/%
    C3D 85.89 84.39
    slowfast 88.90 87.39
    TimeSformer 90.21 89.25
    TPN基准(无车门损失函数) 90.39 89.91
    BANN 95.47 93.89
    下载: 导出CSV

    表  5  司乘上下车动作测试结果

    Table  5.   Test results of driver's boarding and alighting movements

    模型 查准率/% 查全率/% TP FP FN
    C3D 61.14 69.88 225 143 97
    slowfast 64.40 75.24 237 131 78
    TimeSformer 66.30 75.08 244 124 81
    TPN 80.43 77.49 296 72 86
    BANN 90.21 88.53 332 36 43
    下载: 导出CSV
  • [1] 寇敏, 张萌萌, 赵军学, 等. 道路交通安全风险辨识与分析方法综述[J]. 交通信息与安全, 2022, 40(6): 22-32. doi: 10.3963/j.jssn.1674-4861.2022.06.003

    KOU M, ZHANG M M, ZHAO J X, et al. A Review of identification and analysis methods for road safety risk[J]. Journal of Transport Information and Safety, 2022, 40(6): 22-32. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.06.003
    [2] 张博, 庞基敏, 章文嵩, 等. 互联网大数据技术在智慧交通发展中的应用[J]. 科技导报, 2020, 38(9): 47-54.

    ZHANG B, PANG J M, ZHANG W S, et al. Application of internet big data technology in the development of smart transportation[J]. Science & Technology Review, 2020, 38(9): 47-54. (in Chinese)
    [3] 李熙莹, 陆强, 张晓春, 等. 基于人车交互行为模型的上下客行为识别[J]. 中国公路学报, 2021, 34(7): 152-163. doi: 10.3969/j.issn.1001-7372.2021.07.013

    LI X Y, LU Q, ZHANG X C, et al. Boarding and alighting behavior recognition based on human-vehicle interaction behavior model[J]. China Journal of Highway and Transport, 2021, 34(7): 152-163. (in Chinese) doi: 10.3969/j.issn.1001-7372.2021.07.013
    [4] 王隽. 基于机器视觉的高速公路服务区违法上下客识别应用研究[J]. 时代汽车, 2022(14): 196-198 doi: 10.3969/j.issn.1672-9668.2022.14.069

    WANG J. Application research of illegal boarding and alighting recognition in expressway service area based on machine Vision[J]. Auto Time, 2022(14): 196-198. (in Chinese) doi: 10.3969/j.issn.1672-9668.2022.14.069
    [5] 贺艺斌, 田圣哲, 兰贵龙. 基于改进Faster-RCNN算法的行人检测[J]. 汽车实用技术, 2022, 47 (05): 34-37.

    HE Y B, TIAN S Z, LAN G L. Pedestrian detection based on improved faster-RCNN algorithm[J]. Automobile Applied Technology, 2022, 47(05): 34-37. (in Chinese)
    [6] 张若杨, 贾克斌, 刘鹏宇. 视频监控中私自揽客违法行为检测[J]. 计算机应用与软件, 2019, 36 (3): 168-173, 209. doi: 10.3969/j.issn.1000-386x.2019.03.031

    ZHANG R Y, JIA K B, LIU P Y. Illegal behavior detection of carrying passengers privately in video surveillance[J]. Computer Applications and Software, 2019, 36(03): 168-173, 209. (in Chinese) doi: 10.3969/j.issn.1000-386x.2019.03.031
    [7] 房春瑶, 贾克斌, 刘鹏宇. 基于监控视频的出租车违规私揽行为识别[J]. 计算机仿真, 2020, 37 (5): 326-331. doi: 10.3969/j.issn.1006-9348.2020.05.066

    FANG C Y, JIA K B, LIU P Y. Identification of taxi violation behavior based on surveillance video[J]. Computer Simulation, 2020, 37 (5): 326-331. (in Chinese) doi: 10.3969/j.issn.1006-9348.2020.05.066
    [8] JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1): 221-231.
    [9] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. International Conference on Computer Vision, Boston, USA: IEEE, 2015.
    [10] CARREIRA J, ZISSERMAN A. QUO VADIS, Action recognition? a new model and the kinetics dataset[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE/CVF, 2017.
    [11] TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE/CVF, 2018.
    [12] HUANG D A, RAMANATHAN V, MAHAJAN D, et al. What makes a video a video: analyzing temporal information in video understanding models and datasets[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE/CVF, 2018.
    [13] FEICHTENHOFER C, FAN H, MALIK J, et al. Slowfast networks for video recognition[C]. International Conference on Computer Vision, Seoul, Korea (South): IEEE/CVF, 2019.
    [14] YANG C, XU Y, SHI J, et al. Temporal pyramid network for action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA: IEEE, 2020.
    [15] HAN K, XIAO A, WU E, et al. Transformer in transformer[J]. Advances in neural information processing systems, 2021, 34: 15908-15919.
    [16] HAN K, WANG Y, CHEN H, et al. A survey on vision transformer[J]. IEEE Transactions on Pattern Analysis and Machine intelligence, 2022, 45(1): 87-110.
    [17] BERTASIAS G, WANG H, TORRESANI L. Is space-time attention all you need for video understanding?[C]. International Conference on Machine Learning, Vienna, Austria: IMLS, 2021.
    [18] 杨世强, 罗晓宇, 乔丹, 等. 基于滑动窗口和动态规划的连续动作分割与识别[J]. 计算机应用, 2019, 39(2): 348-353.

    YANG S Q, LUO X Y, QIAO D et al. Continuous action segmentation and recognition based on sliding window and dynamic programming[J]. Journal of Computer Applications, 2019, 39(2): 348-353. (in Chinese)
    [19] HARA K, KATAOKA H, SATOH Y. Learning spatio-temporal features with 3D residual networks for action recognition[C]. International Conference on Computer Vision Workshops, Lido Island, Venice, Italy: IEEE, 2017.
    [20] ZHANGE D, ZHANG H, TANG J, et al. Feature pyramid transformer[C]. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK: European Computer Vision Association, 2020.
  • 加载中
图(10) / 表(5)
计量
  • 文章访问数:  82
  • HTML全文浏览量:  24
  • PDF下载量:  10
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-12-24
  • 网络出版日期:  2025-03-08

目录

    /

    返回文章
    返回