A Novel Ship Driver Behavior Recognition Approach Based on Improved TSM
-
摘要: 船舶驾驶员不规范操作是诱发水上交通事故重要因素,设计1种实时船舶驾驶员行为检测方法意义重大。相比汽车驾驶、安防监控等,船舶驾驶舱环境更为复杂,存在无法兼顾多个船员、效率低下和准确率不高等问题。针对这种情况,研究了1种多目标跟踪和行为识别相结合的“两步式”多人行为识别方法。利用YoloV7与ByteTracker建立多目标跟踪器,形成单人的连续特征图。在单目标行为识别算法时间偏移模块(temporal shift module,TSM)的基础上,借助超采样、跨帧拼接等手段处理连续特征图,同时通过EfficientNet-B3与坐标注意力(coordinate attention,CA)模块输出高准确率的识别结果。研究建立了船舶驾驶舱行为数据集“SC-Action”,数据来自不同的船舶驾驶舱监控录像,包含常规行为以及违规行为共计2 000例行为样本。在该数据集上对本文提出的模型进行迁移学习和消融实验,实验结果表明:提出的方法可实现3名驾驶员24帧/s的实时行为识别,识别速度和准确率均优于主流算法。在针对单人行为识别的测试中,方法在应用图像增强模块之后,相比基准TSM模型准确率提升了1.3%;结合注意力机制后,准确率进一步提升1.78%,达到了82.1%,而运算量仅增加0.1%。在多目标测试中,方法的实际推理速度和效果,也超越了该领域的主流方法如SlowFast,验证了其有效性。
-
关键词:
- 航行安全 /
- 行为识别 /
- 目标跟踪 /
- 注意力机制 /
- temporal shift module
Abstract: In maritime transportation, irregular operations by crew onboard represent a significant factor causing maritime accidents. The design of a real-time detection method for monitoring ship driver behavior holds substantial importance. Compared to automobilism driving and security surveillance, the ship's bridge environment is more complex, posing challenges such as the inability to simultaneously monitor multiple crew members, inefficiency and lower accuracy rates. To solve this problem, a two-step multi-person behavior recognition approach combining multi-target tracking and behavior recognition is proposed. Firstly, a multi-target tracker uses the YoloV7 and ByteTracker to generate continuous feature maps of crew. Based on the temporal shift module (TSM) algorithm for single-target behavior recognition, this approach utilizes techniques such as oversampling and cross-frame stitching to process continuous feature maps. Meanwhile, it leverages EfficientNet-B3 alongside the co-ordinate attention (CA) module to produce highly accurate recognition outcomes. The research establishes a ship's bridge behavior dataset "SC-Action", with data from different ship's bridge surveillance videos, including 2,000 behavior samples of both regular and irregular behaviors. Transfer learning and ablation experiments conducted on this dataset demonstrate that the proposed method achieves real-time behavior recognition of three crew at 24 frames per second, with both recognition speed and accuracy superior to mainstream algorithms. In tests targeting single-person behavior recognition, the method's accuracy improved by 1.3% compared to the baseline TSM model after applying the image enhancement module. Incorporating attention mechanism, the accuracy further increased by 1.78%, reaching 82.1%, with only a 0.1% increase in computational load. During multi-target testing, the method also surpasses leading approaches such as SlowFast in practical inference speed and performance, affirming its efficacy.-
Key words:
- navigation safety /
- behavior recognition /
- target tracking /
- attention mechanism /
- temporal shift module
-
表 1 主干网络结构
Table 1. The Structure of core network
阶段 操作模块 分辨率 通道 堆叠层数 1 Conv3x3 448×448 32 1 2 MBConv1, k3x3 224×224 16 2 3 MBConv6, k3x3 224×224 24 3 4 MBConv6, k5x5 112×112 40 3 5 MBConv6, k3x3 56×56 80 5 6 MBConv6, k5x5 28×28 112 5 7 MBConv6, k5x5 28×28 192 6 8 MBConv6, k3x3 14×14 320 2 9 Conv1x1 14×14 1 280 1 10 CA_Block 14×14 1 280 1 11 Pooling&FC 14×14 7 1 表 2 不同模型的运算量,参数量,识别准确率对比
Table 2. Comparison of the computational load, parameter amount and recognition accuracy of different model
模型 特征提取网络 运算量/GMAC 参数量/M Top-1准确率/% TSM ResNet50 132.17 23.52 77.79 ResNet101 251.62 42.51 81.4 SlowFast ResNet50 101.16 33.66 75.1 ResNet101 163.88 52.87 76.66 优化后 Improved 32.28 10.93 82.1 EfficientNet_B3 表 3 消融实验数据
Table 3. Ablation experiment data
方法 M1 M2 M3 TSM- EfficientNet-B3 √ √ √ +图像增强 √ √ +CA注意力 √ 运算量/GMAC 32.23 32.23 32.26 准确率 79.03 80.32 82.1 表 4 各方法视频推理帧率
Table 4. Comparison of video inference frame rates
方法 间隔帧数 平均帧率(/帧/s) SlowFast-ResNet50 0 10 ByteTrack+TSM-ResNet50 0 13 0 15 本文方法 5 19 10 24 -
[1] 王晓, 余永华, 董旭, 等. 智能机舱验证平台设计与开发[J]. 船海工程, 2024, 53(4): 24-28, 35.WANG X, YU Y H, DONG X, et al. Design and development of the intelligent engine cabin verification platform[J]. Ship & Ocean Engineering, 2024, 53(4): 24-28, 35. (in Chinese) [2] 黄亮, 张治豪, 文元桥, 等. 基于轨迹特征的船舶停留行为识别与分类[J]. 交通运输工程学报, 2021, 21(5): 189-198.HUANG L, ZHANG Z H, WEN Y Q, et al. Stopping behavior recognition and classification of ship based on trajectory characteristics[J]. Journal of Traffic and Transportation Engineering, 2021, 21(5): 189-198. (in Chinese) [3] CHEN J H, DI Z J, SHI J, et al. Marine oil spill pollution causes and governance: a case study of Sanchi tanker collision and explosion[J]. Journal of Cleaner Production, 2020, 273: 122978. doi: 10.1016/j.jclepro.2020.122978 [4] ZHANG J, WU Z, LI F, et al. Attention-based convolutional and recurrent neural networks for driving behavior recognition using smartphone sensor data[J]. IEEE Access, 2019, 7: 148031-148046. doi: 10.1109/ACCESS.2019.2932434 [5] 苏晨阳, 武文红, 牛恒茂, 等. 深度学习的工人多种不安全行为识别方法综述[J]. 计算机工程与应用, 2024, 60(5): 30-46.SU C Y, WU W H, NIU H M, et al. Review of deep learning approaches for recognizing multiple unsafe behaviors in workers[J]. Computer Engineering and Applications, 2024, 60(5): 30-46. (in Chinese) [6] 张平, 迟志诚, 陈一凡, 等. 用于自动驾驶车辆的融合注意力机制多目标跟踪算法[J]. 汽车安全与节能学报, 2021, 12 (4): 516-521. doi: 10.3969/j.issn.1674-8484.2021.04.010ZHANG P, CHI Z C, CHEN Y F, et al. Multiple object tracking algorithm integrated with attention mechanism for autonomous vehicles[J]. Journal of Automotive Safety and Energy, 2021, 12(04): 516-521. (in Chinese) doi: 10.3969/j.issn.1674-8484.2021.04.010 [7] ZHANG Y, SUN P, JIANG Y, et al. ByteTrack: multi-object tracking by associating every detection box[C]. Computer Vision-ECCV 2022, Israel: ECCV, 2022. [8] 姜杰, 张立民, 刘凯, 等. 基于改进PP-YOLOE和ByteTrack算法的红外船舶目标检测跟踪方法[J]. 兵器装备工程学报, 2024, 45(11): 291-297. doi: 10.11809/bqzbgcxb2024.11.037JIANG J, ZHANG L, LIU K, et al. Research on infrared ship target detection and tracking method based on improved pp-yoloe and bytetrack algorithms[J]. Journal of Ordnance Equipment Engineering, 2024, 45(11): 291-297. (in Chinese) doi: 10.11809/bqzbgcxb2024.11.037 [9] 陈信强, 王美琳, 李朝锋, 等. 基于深度学习与多级匹配机制的港区人员轨迹提取[J]. 交通运输系统工程与信息, 2023, 23(4): 70-79.CHEN X Q, WANG M L, LI C F, et al. Port staff trajectory extraction based on deep learning and multi-level matching mechanism[J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(4): 70-79. (in Chinese) [10] 高庆吉, 徐达, 罗其俊, 等. 基于深层动态特征双流网络的高效行为识别算法[J]. 计算机应用与软件, 2024, 41(9): 175-181, 189.GAO Q J, XU D, LUO Q J, et al. An efficient action recognition algorithm based on deep dynamic feature dual-stream cnn[J]. Computer Applications and Software, 2024, 41(9): 175-181, 189. (in Chinese) [11] LIN J, GAN C, HAN S. TSM: temporal shift module for efficient video understanding[C]. International Conference on Computer Vision(ICCV), Seoul, Korea: ICCV, 2019. [12] 胡宏宇, 黎烨宸, 张争光, 等. 基于多尺度骨架图和局部视觉上下文融合的驾驶员行为识别方法[J]. 汽车工程, 2024, 46(1): 1-8, 28.HU H Y, LI Y C, ZHANG Z G, et al. Driver behavior recognition based on multi-scale skeleton graph and local visual context method[J]. Automotive Engineering, 2024, 46(1): 1-8, 28. (in Chinese) [13] 吴建清, 张子毅, 王钰博, 等. 考虑多模态数据的重载货车危险驾驶行为识别方法[J]. 交通运输系统工程与信息, 2024, 24(2): 63-75.WU J Q, ZHANG Z Y, WANG Y B, et al. Method for identifying dangerous driving behaviors in heavy-duty trucks based on multi-modal data[J]. Journal of Transportation Systems Engineering and Information Technology, 2024, 24(2): 63-75. (in Chinese) [14] WANG S, CHEN M, RATNAVELU K, et al. Online classroom student engagement analysis based on facial expression recognition using enhanced yolov5 for mitigating cyber-bullying[J]. Measurement Science and Technology, 2024, 36(1): 015419. [15] 章宇翔, 李先旺, 贺德强, 等. 基于改进的多算法融合地铁站内乘客行为识别[J]. 铁道科学与工程学报, 2023, 20 (11): 4096-4106.ZHANG Y X, LI X W, HE D Q. et al. Passenger action recognition in subway stations based on improved multi-algorithm fusion[J]. Journal of Railway Science and Engineering, 2023, 20(11): 4096-4106. (in Chinese) [16] 张孝杰, 张艳伟, 邹鹰, 等. 基于改进YOLOv7的码头作业人员检测算法[J]. 交通信息与安全, 2024, 42(2): 67-75. doi: 10.3963/j.jssn.1674-4861.2024.02.007ZHANG X J, ZHANG Y W, ZOU Y, et al. An improved yolov7 algorithm for workers detection in port terminals[J]. Journal of Transport Information and Safety, 2024, 42(2): 67-75. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2024.02.007 [17] FEICHTENHOFER C, FAN H, MALIK J, et al. SlowFast networks for video recognition[C]. International Conference on Computer Vision(ICCV), Seoul, Korea: IEEE, 2019. [18] SREELAKSHMY I J, KOVOOR B C. Generative inpainting of high-resolution images: redefined with Real-ESRGAN[J]. International Journal of Artificial Intelligence Tools, 2022, 31(5): 2250035. [19] WANG X, YU K, WU S, et al. ESRGAN: enhanced super-resolution generative adversarial networks[C]. Computer Vision-ECCV 2018 Workshops, Munich, Germany: ECCV, 2019. [20] ZHOU A, MA Y, JI W, et al. Multi-head attention-based two-stream EfficientNet for action recognition[J]. Multimedia Systems, 2023, 29(2): 487-498. [21] LI W D, LI Z Y, WANG C S, et al. An improved SSD light-weight network with coordinate attention for aircraft target recognition in scene videos[J]. Journal of Intelligent & Fuzzy Systems, 2024, 46(1): 355-368. [22] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. -