基于改进TSM的船舶驾驶员行为识别方法

陈晨; 魏月楠; 马枫; 胡松涛; 王腾飞

doi:10.3963/j.jssn.1674-4861.2025.01.011

基于改进TSM的船舶驾驶员行为识别方法

doi: 10.3963/j.jssn.1674-4861.2025.01.011

陈晨^1,,
魏月楠¹,
马枫^{2, 3, ,},
胡松涛¹,
王腾飞⁴

1.
武汉工程大学计算机科学与工程学院武汉 430205
2.
武汉理工大学智能交通系统研究中心武汉 430063
3.
水路交通控制全国重点实验室武汉 430063
4.
武汉理工大学交通与物流工程学院武汉 430063

基金项目:

国家自然科学基金项目 52201415

国家自然科学基金项目 52171352

国家重点研发计划项目 2023YFB4302300

水路交通控制全国重点实验室开放课题项目 16-10-1

详细信息

作者简介:
陈晨（1985—），博士，讲师. 研究方向：人工智能. E-mail: chenchen0120@wit.edu.cn

通讯作者:
马枫（1985—），博士，研究员. 研究方向：智能船舶. E-mail: martin7wind@whut.edu.cn

中图分类号: U676.1
计量
- 文章访问数: 11
- HTML全文浏览量: 6
- PDF下载量: 0
- 被引次数: 0
出版历程
- 收稿日期: 2024-07-24
- 网络出版日期: 2025-06-27

A Novel Ship Driver Behavior Recognition Approach Based on Improved TSM

CHEN Chen^1
,,
WEI Yuenan¹,
MA Feng^{2, 3
, ,},
HU Songtao¹,
WANG Tengfei⁴

1.
Computer Science & Engineering Artificial Intelligence, Wuhan Institute of Technology, Wuhan 430205, China
2.
Intelligent Transportation Systems Research Center, Wuhan University of Technology, Wuhan 430063, China
3.
State Key Laboratory of Maritime Technology and Safety, Wuhan 430063, China
4.
School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China

摘要

摘要: 船舶驾驶员不规范操作是诱发水上交通事故重要因素，设计1种实时船舶驾驶员行为检测方法意义重大。相比汽车驾驶、安防监控等，船舶驾驶舱环境更为复杂，存在无法兼顾多个船员、效率低下和准确率不高等问题。针对这种情况，研究了1种多目标跟踪和行为识别相结合的“两步式”多人行为识别方法。利用YoloV7与ByteTracker建立多目标跟踪器，形成单人的连续特征图。在单目标行为识别算法时间偏移模块（temporal shift module，TSM）的基础上，借助超采样、跨帧拼接等手段处理连续特征图，同时通过EfficientNet-B3与坐标注意力（coordinate attention，CA）模块输出高准确率的识别结果。研究建立了船舶驾驶舱行为数据集“SC-Action”，数据来自不同的船舶驾驶舱监控录像，包含常规行为以及违规行为共计2 000例行为样本。在该数据集上对本文提出的模型进行迁移学习和消融实验，实验结果表明：提出的方法可实现3名驾驶员24帧/s的实时行为识别，识别速度和准确率均优于主流算法。在针对单人行为识别的测试中，方法在应用图像增强模块之后，相比基准TSM模型准确率提升了1.3%；结合注意力机制后，准确率进一步提升1.78%，达到了82.1%，而运算量仅增加0.1%。在多目标测试中，方法的实际推理速度和效果，也超越了该领域的主流方法如SlowFast，验证了其有效性。
- 航行安全 /
- 行为识别 /
- 目标跟踪 /
- 注意力机制 /
- temporal shift module
Abstract: In maritime transportation, irregular operations by crew onboard represent a significant factor causing maritime accidents. The design of a real-time detection method for monitoring ship driver behavior holds substantial importance. Compared to automobilism driving and security surveillance, the ship's bridge environment is more complex, posing challenges such as the inability to simultaneously monitor multiple crew members, inefficiency and lower accuracy rates. To solve this problem, a two-step multi-person behavior recognition approach combining multi-target tracking and behavior recognition is proposed. Firstly, a multi-target tracker uses the YoloV7 and ByteTracker to generate continuous feature maps of crew. Based on the temporal shift module (TSM) algorithm for single-target behavior recognition, this approach utilizes techniques such as oversampling and cross-frame stitching to process continuous feature maps. Meanwhile, it leverages EfficientNet-B3 alongside the co-ordinate attention (CA) module to produce highly accurate recognition outcomes. The research establishes a ship's bridge behavior dataset "SC-Action", with data from different ship's bridge surveillance videos, including 2,000 behavior samples of both regular and irregular behaviors. Transfer learning and ablation experiments conducted on this dataset demonstrate that the proposed method achieves real-time behavior recognition of three crew at 24 frames per second, with both recognition speed and accuracy superior to mainstream algorithms. In tests targeting single-person behavior recognition, the method's accuracy improved by 1.3% compared to the baseline TSM model after applying the image enhancement module. Incorporating attention mechanism, the accuracy further increased by 1.78%, reaching 82.1%, with only a 0.1% increase in computational load. During multi-target testing, the method also surpasses leading approaches such as SlowFast in practical inference speed and performance, affirming its efficacy.
- navigation safety /
- behavior recognition /
- target tracking /
- attention mechanism /
- temporal shift module

HTML全文

图 1 多目标行为识别方法结构图

Figure 1. A structure of the multi-target behavior recognition approach

下载: 全尺寸图片幻灯片

图 2 使用不同算法进行超分辨率增强效果对比

Figure 2. Comparison of super-resolution enhancement effects using different algorithms

下载: 全尺寸图片幻灯片

图 3 不同骨干网络在“SC-Action”数据集上的效果

Figure 3. The performance of different backbone networks on the SC-Action dataset

下载: 全尺寸图片幻灯片

图 4 CA注意力结构

Figure 4. The structure of Coordinate Attention

下载: 全尺寸图片幻灯片

图 5 “SC-Action”数据集样本示例

Figure 5. Examples of the SC-Action dataset

下载: 全尺寸图片幻灯片

图 6 部分网络训练损失收敛情况

Figure 6. Convergence of training losses for some networks

下载: 全尺寸图片幻灯片

图 7 优化数据预处理方法前后训练损失值收敛情况对比

Figure 7. Comparison of training loss values with and without customized preprocessing method

下载: 全尺寸图片幻灯片

图 8 各网络浅层特征图对比

Figure 8. Comparison of shallow feature maps of different networks

下载: 全尺寸图片幻灯片

图 9 视频推理效果对比

Figure 9. Comparison of video inference effect

下载: 全尺寸图片幻灯片

表 1 主干网络结构

Table 1. The Structure of core network

阶段	操作模块	分辨率	通道	堆叠层数
1	Conv3x3	448×448	32	1
2	MBConv1, k3x3	224×224	16	2
3	MBConv6, k3x3	224×224	24	3
4	MBConv6, k5x5	112×112	40	3
5	MBConv6, k3x3	56×56	80	5
6	MBConv6, k5x5	28×28	112	5
7	MBConv6, k5x5	28×28	192	6
8	MBConv6, k3x3	14×14	320	2
9	Conv1x1	14×14	1 280	1
10	CA_Block	14×14	1 280	1
11	Pooling&FC	14×14	7	1

下载: 导出CSV

表 2 不同模型的运算量，参数量，识别准确率对比

Table 2. Comparison of the computational load, parameter amount and recognition accuracy of different model

模型	特征提取网络	运算量/GMAC	参数量/M	Top-1准确率/%
TSM	ResNet50	132.17	23.52	77.79
TSM	ResNet101	251.62	42.51	81.4
SlowFast	ResNet50	101.16	33.66	75.1
SlowFast	ResNet101	163.88	52.87	76.66
优化后	Improved	32.28	10.93	82.1
优化后	EfficientNet_B3	32.28	10.93	82.1

下载: 导出CSV

表 3 消融实验数据

Table 3. Ablation experiment data

方法	M1	M2	M3
TSM- EfficientNet-B3	√	√	√
+图像增强		√	√
+CA注意力			√
运算量/GMAC	32.23	32.23	32.26
准确率	79.03	80.32	82.1

下载: 导出CSV

表 4 各方法视频推理帧率

Table 4. Comparison of video inference frame rates

方法	间隔帧数	平均帧率（/帧/s）
SlowFast-ResNet50	0	10
ByteTrack+TSM-ResNet50	0	13
	0	15
本文方法	5	19
本文方法	10	24

下载: 导出CSV

参考文献(22)

[1]	王晓, 余永华, 董旭, 等. 智能机舱验证平台设计与开发[J]. 船海工程, 2024, 53(4): 24-28, 35. WANG X, YU Y H, DONG X, et al. Design and development of the intelligent engine cabin verification platform[J]. Ship & Ocean Engineering, 2024, 53(4): 24-28, 35. (in Chinese)
[2]	黄亮, 张治豪, 文元桥, 等. 基于轨迹特征的船舶停留行为识别与分类[J]. 交通运输工程学报, 2021, 21(5): 189-198. HUANG L, ZHANG Z H, WEN Y Q, et al. Stopping behavior recognition and classification of ship based on trajectory characteristics[J]. Journal of Traffic and Transportation Engineering, 2021, 21(5): 189-198. (in Chinese)
[3]	CHEN J H, DI Z J, SHI J, et al. Marine oil spill pollution causes and governance: a case study of Sanchi tanker collision and explosion[J]. Journal of Cleaner Production, 2020, 273: 122978. doi: 10.1016/j.jclepro.2020.122978
[4]	ZHANG J, WU Z, LI F, et al. Attention-based convolutional and recurrent neural networks for driving behavior recognition using smartphone sensor data[J]. IEEE Access, 2019, 7: 148031-148046. doi: 10.1109/ACCESS.2019.2932434
[5]	苏晨阳, 武文红, 牛恒茂, 等. 深度学习的工人多种不安全行为识别方法综述[J]. 计算机工程与应用, 2024, 60(5): 30-46. SU C Y, WU W H, NIU H M, et al. Review of deep learning approaches for recognizing multiple unsafe behaviors in workers[J]. Computer Engineering and Applications, 2024, 60(5): 30-46. (in Chinese)
[6]	张平, 迟志诚, 陈一凡, 等. 用于自动驾驶车辆的融合注意力机制多目标跟踪算法[J]. 汽车安全与节能学报, 2021, 12 (4): 516-521. doi: 10.3969/j.issn.1674-8484.2021.04.010 ZHANG P, CHI Z C, CHEN Y F, et al. Multiple object tracking algorithm integrated with attention mechanism for autonomous vehicles[J]. Journal of Automotive Safety and Energy, 2021, 12(04): 516-521. (in Chinese) doi: 10.3969/j.issn.1674-8484.2021.04.010
[7]	ZHANG Y, SUN P, JIANG Y, et al. ByteTrack: multi-object tracking by associating every detection box[C]. Computer Vision-ECCV 2022, Israel: ECCV, 2022.
[8]	姜杰, 张立民, 刘凯, 等. 基于改进PP-YOLOE和ByteTrack算法的红外船舶目标检测跟踪方法[J]. 兵器装备工程学报, 2024, 45(11): 291-297. doi: 10.11809/bqzbgcxb2024.11.037 JIANG J, ZHANG L, LIU K, et al. Research on infrared ship target detection and tracking method based on improved pp-yoloe and bytetrack algorithms[J]. Journal of Ordnance Equipment Engineering, 2024, 45(11): 291-297. (in Chinese) doi: 10.11809/bqzbgcxb2024.11.037
[9]	陈信强, 王美琳, 李朝锋, 等. 基于深度学习与多级匹配机制的港区人员轨迹提取[J]. 交通运输系统工程与信息, 2023, 23(4): 70-79. CHEN X Q, WANG M L, LI C F, et al. Port staff trajectory extraction based on deep learning and multi-level matching mechanism[J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(4): 70-79. (in Chinese)
[10]	高庆吉, 徐达, 罗其俊, 等. 基于深层动态特征双流网络的高效行为识别算法[J]. 计算机应用与软件, 2024, 41(9): 175-181, 189. GAO Q J, XU D, LUO Q J, et al. An efficient action recognition algorithm based on deep dynamic feature dual-stream cnn[J]. Computer Applications and Software, 2024, 41(9): 175-181, 189. (in Chinese)
[11]	LIN J, GAN C, HAN S. TSM: temporal shift module for efficient video understanding[C]. International Conference on Computer Vision(ICCV), Seoul, Korea: ICCV, 2019.
[12]	胡宏宇, 黎烨宸, 张争光, 等. 基于多尺度骨架图和局部视觉上下文融合的驾驶员行为识别方法[J]. 汽车工程, 2024, 46(1): 1-8, 28. HU H Y, LI Y C, ZHANG Z G, et al. Driver behavior recognition based on multi-scale skeleton graph and local visual context method[J]. Automotive Engineering, 2024, 46(1): 1-8, 28. (in Chinese)
[13]	吴建清, 张子毅, 王钰博, 等. 考虑多模态数据的重载货车危险驾驶行为识别方法[J]. 交通运输系统工程与信息, 2024, 24(2): 63-75. WU J Q, ZHANG Z Y, WANG Y B, et al. Method for identifying dangerous driving behaviors in heavy-duty trucks based on multi-modal data[J]. Journal of Transportation Systems Engineering and Information Technology, 2024, 24(2): 63-75. (in Chinese)
[14]	WANG S, CHEN M, RATNAVELU K, et al. Online classroom student engagement analysis based on facial expression recognition using enhanced yolov5 for mitigating cyber-bullying[J]. Measurement Science and Technology, 2024, 36(1): 015419.
[15]	章宇翔, 李先旺, 贺德强, 等. 基于改进的多算法融合地铁站内乘客行为识别[J]. 铁道科学与工程学报, 2023, 20 (11): 4096-4106. ZHANG Y X, LI X W, HE D Q. et al. Passenger action recognition in subway stations based on improved multi-algorithm fusion[J]. Journal of Railway Science and Engineering, 2023, 20(11): 4096-4106. (in Chinese)
[16]	张孝杰, 张艳伟, 邹鹰, 等. 基于改进YOLOv7的码头作业人员检测算法[J]. 交通信息与安全, 2024, 42(2): 67-75. doi: 10.3963/j.jssn.1674-4861.2024.02.007 ZHANG X J, ZHANG Y W, ZOU Y, et al. An improved yolov7 algorithm for workers detection in port terminals[J]. Journal of Transport Information and Safety, 2024, 42(2): 67-75. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2024.02.007
[17]	FEICHTENHOFER C, FAN H, MALIK J, et al. SlowFast networks for video recognition[C]. International Conference on Computer Vision(ICCV), Seoul, Korea: IEEE, 2019.
[18]	SREELAKSHMY I J, KOVOOR B C. Generative inpainting of high-resolution images: redefined with Real-ESRGAN[J]. International Journal of Artificial Intelligence Tools, 2022, 31(5): 2250035.
[19]	WANG X, YU K, WU S, et al. ESRGAN: enhanced super-resolution generative adversarial networks[C]. Computer Vision-ECCV 2018 Workshops, Munich, Germany: ECCV, 2019.
[20]	ZHOU A, MA Y, JI W, et al. Multi-head attention-based two-stream EfficientNet for action recognition[J]. Multimedia Systems, 2023, 29(2): 487-498.
[21]	LI W D, LI Z Y, WANG C S, et al. An improved SSD light-weight network with coordinate attention for aircraft target recognition in scene videos[J]. Journal of Intelligent & Fuzzy Systems, 2024, 46(1): 355-368.
[22]	RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.