基于双延迟深度确定性策略梯度的船舶自主避碰方法

刘钊; 周壮壮; 张明阳; 刘敬贤

doi:10.3963/j.jssn.1674-4861.2022.03.007

基于双延迟深度确定性策略梯度的船舶自主避碰方法

doi: 10.3963/j.jssn.1674-4861.2022.03.007

刘钊^{1, 2, 3,},
周壮壮^{1, 2},
张明阳⁴,
刘敬贤^{1, 2, 3, ,}

1.
武汉理工大学航运学院武汉 430063
2.
武汉理工大学内河航运技术湖北省重点实验室武汉 430063
3.
武汉理工大学国家水运安全工程技术研究中心武汉 430063
4.
阿尔托大学工程学院机械工程系芬兰艾斯堡 20110

基金项目:

国家自然科学基金项目 52171351

详细信息

作者简介:
刘钊（1986—），博士，副教授. 研究方向：群船智慧挖掘与应用、船舶智能组织与调度、船舶风险计算与自主航行. E-mail：zhaoliu@whut.edu.cn

通讯作者:
刘敬贤(1967—)，博士，教授. 研究方向：交通环境与安全保障. E-mail：ljxteacher@sohu.com

中图分类号: U675.96
计量
- 文章访问数: 1577
- HTML全文浏览量: 689
- PDF下载量: 87
- 被引次数: 0
出版历程
- 收稿日期: 2022-02-16
- 网络出版日期: 2022-07-25

A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

LIU Zhao^{1, 2, 3
,},
ZHOU Zhuangzhuang^{1, 2},
ZHANG Mingyang⁴,
LIU Jingxian^{1, 2, 3
, ,}

1.
School of Navigation, Wuhan University of Technology, Wuhan 430063, China
2.
Hubei Key Laboratory of Inland Shipping Technology, Wuhan University of Technology, Wuhan 430063, China
3.
National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan 430063, China
4.
School of Engineering, Department of Mechanical Engineering, Aalto University, Espoo 20110, Finland

摘要

摘要: 为满足智能船舶自主航行的发展需求，解决基于强化学习的船舶避碰决策方法存在的学习效率低、泛化能力弱以及复杂会遇场景下鲁棒性差等问题，针对船舶避碰决策信息的高维性和动作的连续性等特点，考虑决策的合理性和实时性，研究了基于双延迟深度确定性策略梯度（TD3）的船舶自主避碰方法。根据船舶间相对运动信息与碰撞危险信息，从全局角度构建具有连续多时刻目标船信息的状态空间；依据船舶操纵性设计连续决策动作空间；综合考虑目标导向、航向保持、碰撞危险、《1972年国际海上避碰规则》（COLREGs）和良好船艺等因素，设计船舶运动的奖励函数；基于TD3算法，根据状态空间结构，结合长短期记忆（LSTM）网络和一维卷积网络，利用Actor-Critic结构设计船舶自主避碰网络模型，利用双价值网络学习、目标策略平滑以及策略网络延迟更新等方式稳定网络训练，利用跳帧以及批量大小和迭代更新次数动态增大等方式加速网络训练；为解决模型泛化能力弱的问题，提出基于TD3的船舶随机会遇场景训练流程，实现自主避碰模型应用的多场景迁移。运用训练得到的船舶自主避碰模型进行仿真验证，并与改进人工势场（APF）算法进行比较，结果表明：所提方法学习效率高，收敛快速平稳；训练得到的自主避碰模型在2船和多船会遇场景下均能使船舶在安全距离上驶过，并且在复杂会遇场景中比改进APF算法避碰成功率高，避让2~4艘目标船时成功率高达99.233%，5~7艘目标船时成功率97.600%，8~10艘目标船时成功率94.166%；所提方法能有效应对来船的不协调行动，避碰实时性高，决策安全合理，航向变化快速平稳、震荡少、避碰路径光滑，比改进APF方法性能更强。
- 交通信息工程 /
- 船舶避碰 /
- 智能决策 /
- 深度强化学习 /
- 双延迟深度确定性策略梯度
Abstract: In order to meet the requirements of developingautonomous navigation of intelligent ships and solve the problems of low learning efficiency, weak generalization ability and poor robustness ofdecision-making methods for collision avoidance based on reinforcement learning, an autonomous collision avoidance method based on Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithmis proposed based on the high-dimensional characteristics of the information processed in the process of collision avoidanceand continuity nature of ship maneuvers, also considering the rationality and real-time progress of decision-making. The collision risk of a given ship is calculated by considering geographical location of the ship and the other ships nearby. The state space of intelligent collision avoidance model for autonomous ships is developed from the perspective of the global point of view. The continuous decision-making and action space of the ship is designed according to the maneuvering characteristics of encountered ships. An intelligent collision avoidance model is developed considering factors such as orientation of target ship, course keeping, collision risk, the COLREGs and good seamanship. Based on the TD3 algorithm, the ship autonomous collision avoidance network model is designed based on the state space structure, combining Long Short Term Memory(LSTM)networks and 1D-convolutional networks, and a network model is designed by using Actor-Critic structure.The network training is stabilized by means of clipped double q-learning, target strategy smoothing, and delayed policy updates.The developed collision avoidance model is trained and updated with random scenarios by usingframe skipping, dynamic increase of batch size, and iterative update times.In order to solve the problem of weak generalization ability of the model, a training process of random shipencounter scenariosbased on TD3 is proposed to achievemulti-scenario migration for theapplications of the model. A simulationis carried out to verify the model, then compared with the modified Artificial Potential Field(APF)method. The results show that the proposed method has high learning efficiency, fast and stable convergence. The trained model is applicable for the ships to passa safe distance in both two-ship and multi-ship encounter scenarios. In a complex encounter scenario, the success rate of collision avoidance reaches 99.233% when avoiding 2~4 target ships, 97.600% when 5~7 target ships, 94.166% when 8~10 target ships, is higher than that of the modified APF algorithm. The proposed method can effectively respond to the uncoordinated actions of incoming ships, with real-time performance, as well as safe and reasonable decision-making.The course change is fast, stable, and the vibration is small, also the path for avoiding collisions is smooth, which has better performance than the modified APF method.
- traffic information engineering /
- ship collision avoidance /
- intelligent decision-making /
- deep reinforcement learning /
- twin delayed deep deterministic policy gradient

HTML全文

图 1 船舶自主避碰框架

Figure 1. Ship autonomous collision avoidance framework

下载: 全尺寸图片幻灯片

图 2 强化学习基本原理图

Figure 2. Reinforcement learning fundamentals diagram

下载: 全尺寸图片幻灯片

图 3 碰撞危险判断示意图

Figure 3. Collision risk judgment diagram

下载: 全尺寸图片幻灯片

图 4 船舶会遇避碰策略

Figure 4. Collision avoidance strategy under the situation of ship encounter

下载: 全尺寸图片幻灯片

图 5 船舶自主避碰策略网络结构

Figure 5. Actor network structure of ship autonomous collision avoidance

下载: 全尺寸图片幻灯片

图 6 船舶自主避碰算法训练流程

Figure 6. Training process of ship autonomous collision avoidance algorithm

下载: 全尺寸图片幻灯片

图 7 累积奖励变化曲线

Figure 7. Total reward curve

下载: 全尺寸图片幻灯片

图 8 追越局面船舶轨迹图

Figure 8. Ship trajectory diagram of overtaking situation

下载: 全尺寸图片幻灯片

图 9 追越局面本船航向变化曲线

Figure 9. Course change curve of own ship in overtaking situation

下载: 全尺寸图片幻灯片

图 10 追越局面船舶间距离变化曲线

Figure 10. Curve of distance between ships in overtaking situation

下载: 全尺寸图片幻灯片

图 11 追越局面左转场景仿真结果

Figure 11. Simulation results of left turn scenario in overtaking situation

下载: 全尺寸图片幻灯片

图 12 对遇局面船舶轨迹图

Figure 12. Ship trajectory diagram of head-on situation

下载: 全尺寸图片幻灯片

图 13 对遇局面本船航向变化曲线

Figure 13. Course change curve of own ship in head-on situation

下载: 全尺寸图片幻灯片

图 14 对遇局面目标船航向变化曲线

Figure 14. Course change curve of target ship in head-on situation

下载: 全尺寸图片幻灯片

图 15 对遇局面船舶间距离变化曲线

Figure 15. Curve of distance between ships in head-on situation

下载: 全尺寸图片幻灯片

图 16 交叉相遇局面船舶轨迹图

Figure 16. Ship trajectory diagram of crossing situation

下载: 全尺寸图片幻灯片

图 17 交叉相遇局面本船航向变化曲线

Figure 17. Course change curve of own ship in crossing situation

下载: 全尺寸图片幻灯片

图 18 交叉相遇局面船舶间距离变化曲线

Figure 18. Curve of distance between ships in crossing situation

下载: 全尺寸图片幻灯片

图 19 多船会遇场景船舶轨迹图

Figure 19. Ship trajectory diagram of multi-ships encounter scenario

下载: 全尺寸图片幻灯片

图 20 多船会遇场景本船航向变化曲线

Figure 20. Course change curve of own ship in multi-ships encounter scenario

下载: 全尺寸图片幻灯片

图 21 TD3避碰算法下本船与目标船距离变化曲线

Figure 21. Curve of distance between own ship and target ships under TD3 collision avoidance algorithm

下载: 全尺寸图片幻灯片

图 22 APF避碰算法下本船与目标船距离变化曲线

Figure 22. Curve of distance between own ship and target ships APF collision avoidance algorithm

下载: 全尺寸图片幻灯片

表 1 实验环境信息

Table 1. Experimental environment conditions

硬件环境	处理器(CPU) 显卡(GPU) 内存	AMD Ryzen 9 5900X NVIDIAGeForce RTX 3080Ti 12G G.Skill 32G/3600Mhz
软件环境	操作系统	Windows10(64位）
	编程语言	Python 3.9.7
	深度学习框架	TensorFlow 2.6.2
	强化学习环境	OpenAI Gym 0.19.0

下载: 导出CSV

表 2 船舶避碰算法训练参数

Table 2. Training parameters of ship collision avoidance algorithm

参数	数值	参数	数值
迭代次数	1.5x10⁶	折扣因子	0.94
经验池容量	1x10⁶	探索噪声方差	0.5
网络学习率	0.000 3	平滑噪声方差	1
批量大小	128~256	延迟更新频率	4
迭代更新次数	1~2	软更新率	0.005

下载: 导出CSV

表 3 多船会遇场景目标船初始设置

Table 3. Initial setting of target ship in multi-ship encounter scenario

船舶	初始位置	航向/（°）	航速/(n mile/h)
目标船（TS1)	(7, 0)	315	12
目标船（TS2)	(7.5, 7)	225	12
目标船3(TS3)	(6, 8)	180	5
目标船4(TS4)	(4, 8)	120	4

下载: 导出CSV

表 4 对比实验结果数据表

Table 4. Comparison test results data table

实验方法	目标船数量/艘	成功率/%	平均路径长度/n mile
TD3 避碰算法	2~4	99.233	15.823
	5~7	97.600	17.951
	8~10	94.166	19.894
改进APF 避碰算法	2~4	97.700	14.978
	5~7	93.766	16.163
	8~10	89.033	17.394

下载: 导出CSV

参考文献(26)

[1]	张笛, 赵银祥, 崔一帆, 等. 智能船舶的研究现状可视化分析与发展趋势[J]. 交通信息与安全, 2021, 39(1): 7-16+34. https://www.cnki.com.cn/Article/CJFDTOTAL-JTJS202101003.htm ZHANG D, ZHAO Y X, CUN Y F, et al. A visualization analysis and development trend of intelligent ship studies[J]. Journal of Transport Information and Safety, 2021, 39(1): 7-16+34. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JTJS202101003.htm
[2]	LYU H G, YIN Y. COLREGS-constrained real-time path planning for autonomous ships using modified artificial potential fields[J]. The Journal of Navigation, 2019, 72(3): 588-608. doi: 10.1017/S0373463318000796
[3]	黄立文, 李浩宇, 梁宇, 等. 基于操纵过程推演的船舶可变速自动避碰决策方法[J]. 交通信息与安全, 2021, 39(6): 1-10. doi: 10.3963/j.jssn.1674-4861.2021.06.001 HUANG L W, LI H Y, LIANG Y, et al. A decision-support system for automated collision avoidance of ships with variable speed based on simulation of maneuvering process[J]. Journal of Transport Information and Safety, 2021, 39(6): 1-10. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2021.06.001
[4]	丁志国, 张新宇, 王程博, 等. 基于驾驶实践的无人船智能避碰决策方法[J]. 中国舰船研究, 2021, 16(1): 96-104+113. https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202101011.htm DING Z G, ZHANG X Y, WANG C B, et al. Intelligent collision avoidance decision-making method for unmanned ships based on driving practice[J]. Chinese Journal of Ship Research, 2021, 16(1): 96-104+113. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202101011.htm
[5]	WANG T F, WU Q, ZHANG J F, et al. Autonomous decision-making scheme for multi-ship collision avoidance with iterative observation and inference[J]. Ocean Engineering, 2020(197): 106873.
[6]	刘冬冬, 史国友, 李伟峰, 等. 基于最短避碰距离和碰撞危险度的避碰决策支持[J]. 上海海事大学学报, 2018, 39(1): 13-18. https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201801004.htm LIU D D, SHI G Y, LI W F, et al. Decision support of collision avoidance based on shortest avoidance distance and collision risk[J]. Journal of Shanghai Maritime University, 2018, 39(1): 13-18. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201801004.htm
[7]	KIM H, KIM S H, JEON M, et al. A study on path optimization method of an unmanned surface vehicle under environmental loads using genetic algorithm[J]. Ocean Engineering, 2017(142): 616-624.
[8]	ZHANG J F, ZHANG D, YAN X P, et al. A distributed anti-collision decision support formulation in multi-ship encounter situations under COLREGs[J]. Ocean Engineering, 2015 (105): 336-348.
[9]	朱凯歌, 史国友, 刘娇, 等. 基于船舶领域的让路船决策分析[J]. 上海海事大学学报, 2019, 40(3): 26-31. https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201903006.htm ZHU K G, SHI G Y, LIU J, et al. Analysis on decision-making of give-way ships based on ship domain[J]. Journal of Shanghai Maritime University, 2019, 40(3): 26-31. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201903006.htm
[10]	KANG Y T, CHEN W J, ZHU D Q, et al. Collision avoidance path planning in multi-ship encounter situations[J]. Journal of Marine Science and Technology, 2021, 26(4): 1026-1037. doi: 10.1007/s00773-021-00796-z
[11]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
[12]	CHENG Y, ZHANG W D. Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels[J]. Neurocomputing, 2018(272): 63-73.
[13]	王程博, 张新宇, 邹志强, 等. 基于Q-Learning的无人驾驶船舶路径规划[J]. 船海工程, 2018, 47(5): 168-171. https://www.cnki.com.cn/Article/CJFDTOTAL-WHZC201805038.htm WANG C B, ZHANG X Y, ZOU Z Q, et al. On path planning of unmanned ship based on Q-Learning[J]. Ship & Ocean Engineering, 2018, 47(5): 168-171. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-WHZC201805038.htm
[14]	周怡, 袁传平, 谢海成, 等. 基于DDPG算法的游船航行避碰路径规划[J]. 中国舰船研究, 2021, 16(6): 19-26, 60. https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202106003.htm ZHOU Y, YUAN C P, XIE H C, et al. Collision avoidance path planning of tourist ship based on DDPG algorithm[J]. Chinese Journal of Ship Research, 2021, 16(6): 19-26, 60. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202106003.htm
[15]	ZHAO L M, ROH M I, LEE S J. Control method for path following and collision avoidance of autonomous ship based on deep reinforcement learning[J]. Journal of Marine Science and Technology, 2019, 27(4): 293-310.
[16]	ZHAO L, ROH M I. COLREGs-compliant multiship collision avoidance based on deep reinforcement learning[J]. Ocean Engineering, 2019(191): 106436.
[17]	XIE S, CHU X M, ZHENG M, et al. A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control[J]. Neurocomputing, 2020 (411): 375-392.
[18]	周双林, 杨星, 刘克中, 等. 规则约束下基于深度强化学习的船舶避碰方法[J]. 中国航海, 2020, 43(3): 27-32+46. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGHH202003005.htm ZHOU S L, YANG X, LIU K Z, et al. COLREGs-compliant method for ship collision avoidance based on deep reinforcement learning[J]. Navigation of China, 2020, 43(3): 27-32+46. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGHH202003005.htm
[19]	SHEN H Q, HASHIMOTO H, MATSUDA A, et al. Automatic collision avoidance of multiple ships based on deep Q-learning[J]. Applied Ocean Research, 2019(86): 268-288.
[20]	SAWADA R, SATO K, MAJIMA T. Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuous action spaces[J]. Journal of Marine Science and Technology, 2021, 26(2): 509-524.
[21]	CHUN D H, ROH M I, LEE H W, et al. Deep reinforcement learning-based collision avoidance for an autonomous ship[J]. Ocean Engineering, 2021(234): 109216.
[22]	VAGALE A, OUCHEIKH R, BYE R T, et al. Path planning and collision avoidance for autonomous surface vehicles I: A review[J]. Journal of Marine Science and Technology, 2021 (26): 1292-1306.
[23]	AKDAG M, SOLNOR P, JOHANSEN T A. Collaborative collision avoidance for maritime autonomous surface ships: A review[J]. Ocean Engineering, 2022(250): 110920.
[24]	SINGH B, KUMAR R, SINGH V P. Reinforcement learning in robotic applications: a comprehensive survey[J]. Artificial Intelligence Review, 2021(55): 945-990.
[25]	FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[J]. Proceedings of Machine Learning Research, 2018(80): 1587-1596.
[26]	NGUYEN T T, NGUYEN N D, NAHAVANDI S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.

施引文献

资源附件(0)

访问统计

点击查看大图

图(22) / 表(4)

计量

文章访问数: 1577
HTML全文浏览量: 689
PDF下载量: 87
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于双延迟深度确定性策略梯度的船舶自主避碰方法

doi: 10.3963/j.jssn.1674-4861.2022.03.007

作者简介:
刘钊（1986—），博士，副教授. 研究方向：群船智慧挖掘与应用、船舶智能组织与调度、船舶风险计算与自主航行. E-mail：zhaoliu@whut.edu.cn

通讯作者:
刘敬贤(1967—)，博士，教授. 研究方向：交通环境与安全保障. E-mail：ljxteacher@sohu.com

计量

A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

计量

目录

留言板

基于双延迟深度确定性策略梯度的船舶自主避碰方法

doi: 10.3963/j.jssn.1674-4861.2022.03.007

作者简介: 刘钊（1986—），博士，副教授. 研究方向：群船智慧挖掘与应用、船舶智能组织与调度、船舶风险计算与自主航行. E-mail：zhaoliu@whut.edu.cn

通讯作者: 刘敬贤(1967—)，博士，教授. 研究方向：交通环境与安全保障. E-mail：ljxteacher@sohu.com

计量

出版历程

A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

计量

出版历程

目录

作者简介:
刘钊（1986—），博士，副教授. 研究方向：群船智慧挖掘与应用、船舶智能组织与调度、船舶风险计算与自主航行. E-mail：zhaoliu@whut.edu.cn

通讯作者:
刘敬贤(1967—)，博士，教授. 研究方向：交通环境与安全保障. E-mail：ljxteacher@sohu.com