混行下CAV作业区分段式深度强化学习合流模型

辛琪; 荚胜琪; 徐猛; 齐嘉乐; 袁伟

doi:10.3963/j.jssn.1674-4861.2025.02.011

混行下CAV作业区分段式深度强化学习合流模型

doi: 10.3963/j.jssn.1674-4861.2025.02.011

辛琪^1, ,,
荚胜琪¹,
徐猛²,
齐嘉乐¹,
袁伟¹

1.
长安大学汽车学院西安 710064
2.
北京交通大学系统科学学院北京 100044

基金项目:

国家自然科学基金项目 52002035

陕西省重点研发计划 2024CY2-GJHX-87

陕西省自然科学基础研究计划项目 2025JC-YBMS-395

详细信息

通讯作者:
辛琪（1987—），博士，副教授.研究方向：交通信息工程及控制、交通安全等. E-mail: xinqi@chd.edu.cn

中图分类号: U491.4
计量
- 文章访问数: 34
- HTML全文浏览量: 29
- PDF下载量: 4
- 被引次数: 0
出版历程
- 收稿日期: 2024-07-02
- 网络出版日期: 2025-09-29

A Merging Model Based on Piecewise Deep Reinforcement Learning for Connected and Autonomous Vehicle in Work Zone under Mixed Autonomy

XIN Qi^{1
, ,},
JIA Shengqi¹,
XU Meng²,
QI Jiale¹,
YUAN Wei¹

1.
School of Automobile, Chang'an University, Xi'an 710064, China
2.
School of System Science, Beijing Jiaotong University, Beijing 100044, China

摘要

摘要: 针对经典提前合流和延迟合流对动态流量适应性差，以及上游速度差导致合流车辆“错位”问题，研究了基于深度强化学习方法的作业区智能网联车(connected and autonomous vehicle, CAV)分段控制合流模型。通过依次进行车速引导、间距创建和位置对齐，解决换道期多辆封闭车道合流车辆同时申请汇入1个开放车道间距而导致的汇入冲突和效率降低问题。模型将基于柔性演员-评论家算法的纵向轨迹控制与规则的换道决策相结合，共同优化合流轨迹。其中纵向轨迹优化首先选取自车速度与加速度、前车速度与到其距离、相邻车道前后车速度与到其距离、到合流点距离9个特征作为智能体状态，用以刻画自车所处的局部和全局交通状态；其次以降低加速度幅值及其变化率、避免碰撞、创建合流间距、对齐开放车道间距中心、抑制前后车速度差、按推荐速度引导、增加后车让行为目标，分别从舒适、安全、效率角度构建了作业区分段式奖励函数。特别地，基于目标车道后车速度差构建的效率惩罚性函数，解决了混行交通流合流点停车延误多的问题。仿真结果表明：在中、高流量下，与提前合流、延迟合流和新英格兰合流方法相比，本文模型平均车速和最小碰撞时间分别提升了约4.76%和19.71%，进一步加强了作业区行车效率及安全；此外，在含异质人工驾驶车辆的混行交通下，随着CAV市场渗透率的提高，平均车速、最小碰撞时间和合流成功率均呈增大趋势，且均能实现不停车合流。
- 智能交通 /
- 作业区合流 /
- 合流控制模型 /
- 柔性演员-评论家算法 /
- 混合交通流
Abstract: The classical early and late merge work worse under dynamic demand, and render conflict merging gap due to large speed differences at the upstream. To this end, a piece-wise deep reinforcement learning-based merging model is proposed for connected and autonomous vehicles (CAVs) in work zones under mixed autonomy. Above all, the merging conflicts and efficiency reduction caused by many vehicles in closed lanes trying to merge into one gap on the open lane are addressed by the model with speed guidance, gap creation, and positional alignment. Such a model consists of the soft Actor-Critic algorithm-based longitudinal control and the rule-based lane-changing decision-making. For longitudinal control, 9 features are selected as the agent state to describe surrounding traffic conditions from both local and global views. The mentioned features include the speed and acceleration of the ego vehicle, the speed of and the distance to the lead vehicle, the speed of and the distance to the lead and lag vehicles on the adjacent left lane, and the distance to the merging point. Subsequently, a piecewise reward function for CAVs in the work zone is established by optimizing comfort, safety, and efficiency simultaneously. Such a reward function combines minimizing acceleration and jerk, preventing collisions, generating merging gaps, aligning with the gap center on the open lane, mitigating vehicular speed differences, adhering to advisory speed, and encouraging following vehicles with yield behavior. Particularly, an item of reward function with respect to driving efficiency is shaped on the basis of the speed difference between the lag vehicle on the adjacent lane and the ego vehicle, such that halting of both the CAV and the human-driving vehicle can be alleviated at the merging point. Simulation results illustrate that the proposed model increases by about 4.76% of average speed, and 19.71% of minimal time-to-collision under medium/heavy demand in work zone, in contrast to early merge, late merge and New England merge. In addition, the average speed, minimum time-to-collision, and successful merging rate in mixed autonomy with heterogeneous human-driving vehicles, increase with the increase of the CAV market penetration rate, while all the vehicles merge without halting.
- intelligent transportation /
- work zone merging /
- merging control model /
- soft Actor-Critic algorithm /
- mixed traffic flow

HTML全文

图 1 单向2车道高速公路作业控制区布设及合流策略

Figure 1. Layout and merging strategy of work zone on two-way four-lane freeway

下载: 全尺寸图片幻灯片

图 2 预见性合流策略的“错位”问题

Figure 2. The competing merging gap problem of the predictive merging strategy

下载: 全尺寸图片幻灯片

图 3 纵横向集成控制流程

Figure 3. Lateral and longitudinal control process

下载: 全尺寸图片幻灯片

图 4 SAC算法框架

Figure 4. Framework of soft actor-critic algorithm

下载: 全尺寸图片幻灯片

图 5 平稳驾驶奖惩函数设置范围

Figure 5. Impact range of reward function with respect to smooth driving

下载: 全尺寸图片幻灯片

图 6 行驶安全奖惩函数设置范围

Figure 6. Impact range of reward function with respect to driving safety

下载: 全尺寸图片幻灯片

图 7 行驶效率奖惩函数设置范围

Figure 7. Impact range of reward function with respect to driving efficiency

下载: 全尺寸图片幻灯片

图 8 西安绕城高速公路灞河西处作业区

Figure 8. Xi'an ring expressway work zone at the west of Bahe river

下载: 全尺寸图片幻灯片

图 9 间距创建区中的纵向控制

Figure 9. Longitudinal control in gap creation zone

下载: 全尺寸图片幻灯片

图 10 SUMO仿真环境设置

Figure 10. Environment settings in SUMO

下载: 全尺寸图片幻灯片

图 11 SAC训练结果曲线

Figure 11. Training result

下载: 全尺寸图片幻灯片

图 12 RLM车辆位置轨迹

Figure 12. Trajectory of vehicular position under RLM

下载: 全尺寸图片幻灯片

图 13 不同CAV市场渗透率下的合流性能

Figure 13. Performance indexes of merging under different CAV market penetration rates

下载: 全尺寸图片幻灯片

图 14 速度匹配区

Figure 14. Speed matching area

下载: 全尺寸图片幻灯片

图 15 间距创建区

Figure 15. Distance creation area

下载: 全尺寸图片幻灯片

图 16 位置对齐区

Figure 16. Position alignment area

下载: 全尺寸图片幻灯片

图 17 合流区

Figure 17. Merging area

下载: 全尺寸图片幻灯片

图 18 不同策略合流失败车辆时空轨迹图

Figure 18. Spacetime diagram of merge failed vehicle with different strategies

下载: 全尺寸图片幻灯片

图 19 流量1 500 pcu/h时LM与RLM控制下各车道密度

Figure 19. Density of each lane under LM and RLM at 1 500 pcu/h

下载: 全尺寸图片幻灯片

图 20 流量2 000 pcu/h时LM与RLM控制下各车道密度

Figure 20. Density of each lane under LM and RLM at 2 000 pcu/h

下载: 全尺寸图片幻灯片

图 21 不同流量下不同策略的加速度分布

Figure 21. Acceleration distribution of different strategies under various traffic conditions

下载: 全尺寸图片幻灯片

表 1 智能体状态

Table 1. State of agents

状态特征	含义
v_{h, t}^e	第h个车辆在t时刻的自车速度
a_{h, t}^e	第h个车辆在t时刻的自车加速度
p_{h, t}^e	第h个车辆在t时刻到合流点的距离
v_{h, t}^f	第h个车辆在t时刻的前车速度
s_{h, t}^f	第h个车辆在t时刻与前车的间距
v_{h, t}^lf	第h个车辆在t时刻相邻车道的前车速度
s_{h, t}^lf	第h个车辆在t时刻与相邻车道前车的间距
v_{h, t}^lb	第h个车辆在t时刻相邻车道的后车速度
s_{h, t}^lb	第h个车辆在t时刻与相邻车道后车的间距

下载: 导出CSV

表 2 模型参数设置

Table 2. Model parameter settings

参数	取值	参数	取值
折扣系数γ	0.99	纵向碰撞惩罚系数δ	5
经验回放池容量N_rb	648 000	横向碰撞惩罚系数κ	4
样本批量大小N_b	128	位置奖励系数ε	0.3
隐藏层神经元数量n_h	256	平滑奖励系数η	1.5
延迟更新步数τ_step	3	平稳性惩罚系数λ	0.02
策略网络分布均值μ_p	0.001	匹配区内期望速度v_in /(m/s)	25
策略网络分布标准差σ_p	0.001	匹配区外期望速度v_out /(m/s)	30
Actor学习率l_r-actor	0.000 3	最高限速v_max /(m/s)	33.33
Critic学习率l_r-critic	0.000 3	最低限速v_min /(m/s)	16.67
车辆安全度参数σ_v	0.5	封闭车道线性调整参数α	0.125
停车间距s_CC0 /m	1.5	开放车道线性调整参数β	0.25
跟驰随机振荡距离s_CC2 /m	4	效率换道阈值i /m	30
安全时距s_CC1c /s	1.7	间距创建区起点p^ges /m	450
2倍安全时距s_CC1o /s	3.8	位置对齐区起点p^pae /m	850
舒适性奖励权重ω_a	0.1	位置对齐区终点p^pas /m	1 650
安全性奖励权重ω_s	0.5	合流区终点p^me /m	1 850
效率性奖励权重ω_e	0.4	车长L₂ /m	5

下载: 导出CSV

表 3 异质HDV混合交通流仿真结果

Table 3. Simulation of heterogeneous HDV with mixed autonomy

流量/（pcu/h）	CAV渗透率	平均车速/（m/s）	最小TTC/s	合流成功率/%
1 000	0.2	26.15	0.71	97.20
	0.4	26.31	1.32	97.70
	0.6	26.65	2.26	98.80
	0.8	27.38	4.03	99.20
1 500	0.2	23.79	0.61	95.47
	0.4	24.82	0.80	96.07
	0.6	25.64	1.00	97.93
	0.8	26.95	1.41	98.27
2 000	0.2	22.89	0.51	91.65
	0.4	25.22	0.53	93.70
	0.6	25.78	0.61	95.30
	0.8	26.76	0.74	97.25

下载: 导出CSV

表 4 合流效果比较

Table 4. Comparison of merging effects

流量/（pcu/h）	合流策略	合流成功率/%	平均车速/（m/s）	最小TTC/s
1 000	EM	100	28.61	6.43
	LM	100	29.03	4.01
	NEM	100	29.13	5.87
	RLM	100	28.69	6.74
1 500	EM	98.89	27.55	1.73
	LM	98.51	27.94	1.66
	NEM	99.75	28.51	1.98
	RLM	100	28.58	2.11
2 000	EM	89.39	25.12	0.76
	LM	92.39	24.39	0.88
	NEM	98.05	27.58	0.95
	RLM	99.35	28.23	1.07

下载: 导出CSV

表 5 不同强化学习方法下的模型效果

Table 5. Model performance under different reinforcement learning methods

流量/（pcu/h）	模型	平均车速/（m/s）	最小TTC/s	平均停车数/（次/车道）
1 000	DDPG-M	27.23	6.89	0.05
	TD3-M	27.69	6.07	0.01
	SAC-NEM	29.13	5.87	0.02
	RLM	28.69	6.74	0.00
1 500	DDPG-M	27.21	2.17	0.07
	TD3-M	27.63	2.01	0.03
	SAC-NEM	28.51	1.98	0.09
	RLM	28.58	2.11	0.01
2 000	DDPG-M	26.93	0.75	0.09
	TD3-M	27.58	0.72	0.09
	SAC-NEM	27.58	0.95	0.31
	RLM	28.23	1.07	0.03

下载: 导出CSV

表 6 舒适性仿真结果

Table 6. Results of comfort simulation

流量/（pcu/h）	策略	总加权加速度均方根值/（m/s²）
1 500	EM	1.14
	LM	1.13
	RLM	0.73
2 000	EM	1.26
	LM	1.32
	RLM	0.781

下载: 导出CSV

参考文献(20)

[1]	段克, 马社强, 闫学东, 等. 考虑排队长度的高速公路施工区动态信号控制策略[J]. 北京交通大学学报, 2024, 48(4): 131-140. DUAN K, MA S Q, YAN X D, et al. Dynamic signal control strategy for freeway work zones considering queue length[J]. Journal of Beijing Jiaotong University, 2024, 48(4): 131-140. (in Chinese)
[2]	秦严严, 罗钦中, 贺正冰. 网联自动驾驶车辆混合交通流专用道管控方法[J]. 交通运输工程学报, 2023, 23(3): 221-231. QIN Y Y, LUO Q Z, HE Z B. Control method for dedicated lanes in mixed traffic flow of connected and autonomous vehicles[J]. Journal of Traffic and Transportation Engineering, 2023, 23(3): 221-231. (in Chinese)
[3]	WANG J, GONG S, PEETA S, et al. A real-time deployable model predictive control-based cooperative platooning approach for connected and autonomous vehicles[J]. Transportation Research Part B: Methodological, 2019, 128: 271-301. doi: 10.1016/j.trb.2019.08.002
[4]	TORRE FL, DOMENICHINI L, NOCENTINI A. Effects of stationary work zones on motorway crashes[J]. Safety Science, 2017, 92: 148-159. doi: 10.1016/j.ssci.2016.10.008
[5]	DEHMAN A, FAROOQ B. Are work zones and connected automated vehicles ready for a harmonious coexistence? A scoping review and research agenda[J]. Transportation Research Part C: Emerging Technologies, 2021, 133: 103422. doi: 10.1016/j.trc.2021.103422
[6]	YUAN Y, LIU Y, LIU W. Dynamic lane-based signal merge control for freeway work zone operations[J]. Journal of Transportation Engineering, Part A: Systems, 2019, 145(12): 04019053. doi: 10.1061/JTEPBS.0000256
[7]	孟祥海, 张龙钊, 李生龙. 四车道高速公路部分占用超车道交通控制区交通特性及通行能力研究[J]. 交通运输系统工程与信息, 2020, 20(2): 218-224. MENG H X, ZHANG L Z, LI S L. Research on traffic characteristics and capacity of partially occupied overtaking lanes in four-lane highways[J]. Journal of Transportation Systems Engineering and Information Technology, 2020, 20(2): 218-224. (in Chinese)
[8]	QI Y, ZHAO Q. Safety impacts of signalized lane merge control at highway work zones[J]. Transportation Planning and Technology, 2017, 40(5): 577-591. doi: 10.1080/03081060.2017.1314499
[9]	陈卫霞, 郑俞, 孟祥海, 等. 双向四车道高速公路超车道封闭施工作业区交通运行特性研究[J]. 公路, 2020, 67(6): 229-237. CHEN W X, ZHENG Y, MENG X H, et al. Research on traffic operation characteristics of closed construction working area of dual 2-lanes expressway[J]. Highway, 2022, 67(6): 229-237.
[10]	李春, 吴志周, 曾广, 等. 合流区智能网联汽车协同控制方法综述[J]. 计算机工程与应用, 2024, 60(12): 1-17. LI C, WU Z Z, ZENG G, et al. Review of cooperative control methods for connected and autonomous vehicles in merging areas[J]. Computer Engineering and Applications, 2024, 60(12): 1-17. (in Chinese)
[11]	杨澜, 赵祥模, 吴国垣, 等. 智能网联汽车协同生态驾驶策略综述[J]. 交通运输工程学报, 2020, 20(5): 58-72. YANG L, ZHAO X M, WU G Y, et al. Review of cooperative eco-driving strategies for connected and autonomous vehicles[J]. Journal of Transportation Engineering, 2020, 20(5): 58-72. (in Chinese)
[12]	HU X, SUN J. Trajectory optimization of connected and autonomous vehicles at a multilane freeway merging area[J]. Transportation Research Part C: Emerging Technologies, 2019, 101: 111-125. doi: 10.1016/j.trc.2019.02.016
[13]	REN T, XIE Y, JIANG L. Cooperative highway work zone merge control based on reinforcement learning in a connected and automated environment[J]. Transportation Research Record: Journal of the Transportation Research Board, 2020, 2674(10): 363-374. doi: 10.1177/0361198120935873
[14]	陈玲娟, 张思琦, 马东方. 施工区混行车流跟驰及换道模型研究[J]. 交通运输系统工程与信息, 2021, 21(2): 58-64. CHEN L J, ZHANG S Q, MA D F. Research on mixed traffic flow car-following and lane changing models in work zones[J]. Journal of Transportation Systems Engineering and Information Technology, 2021, 21(2): 58-64. (in Chinese)
[15]	胡笳, 安连华, 李欣. 面向新型混合交通流的快速路合流区通行能力建模[J]. 交通信息与安全, 2021, 39(1): 137-144. doi: 10.3963/jssn.1674-4861.2021.01.016 HU J, AN L H, LI X. Capacity modeling of freeway merging areas for new mixed traffic flows[J]. Journal of Transportation Information and Safety, 2021, 39(1): 137-144. (in Chinese) doi: 10.3963/jssn.1674-4861.2021.01.016
[16]	韩磊, 张轮, 郭为安. 混合交通流环境下基于MSIF-DRL的网联自动驾驶车辆换道决策模型[J]. 北京交通大学学报, 2023, 47(5): 148-161. HAN L, ZHANG L, GUO W A. Lane-changing decision model for connected autonomous vehicles in mixed traffic flow based on MSIF-DRL[J]. Journal of Beijing Jiaotong University, 2023, 47(5): 148-161. (in Chinese)
[17]	郝威, 龚雅馨, 张兆磊, 等. 面向高速公路混合交通流的车辆协同合流策略[J]. 交通运输系统工程与信息, 2023, 23(1): 224-235. HAO W, GONG Y X, ZHANG Z LA, et al. Vehicle cooperative merging strategy for mixed traffic flow on highways[J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(1): 224-235. (in Chinese)
[18]	过秀成, 肖哲, 张一鸣, 等. 考虑智能网联车辆影响的八车道高速公路施工区可变限速控制方法[J]. 东南大学学报(自然科学版), 2024, 54(2): 353-359. GUO X C, XIAO Z, ZHANG Y M, et al. Variable speed limit control method in work zone area of eight-lane highway considering effects of connected automated vehicles[J]. Journal of Southeast University (Natural Science Edition), 2024, 54(2): 353-359
[19]	HOU G, CHEN S. Study of work zone traffic safety under adverse driving conditions with a microscopic traffic simulation approach-ScienceDirect[J]. Accident Analysis & Prevention, 2020. 145: 105698.
[20]	VIRDI N, GRZYBOWSKA H, WALLER S T, et al. A safety assessment of mixed fleets with connected and autonomous vehicles using the surrogate safety assessment module[J]. Accident Analysis & Prevention, 2019, 131: 95-111.