Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control

Zihao Sheng, Zilin Huang, Sikai Chen

University of Wisconsin-Madison
Corresponding Author

TL;DR: We propose a knowledge-informed model-based residual reinforcement learning framework that integrates traffic expert knowledge with neural networks to enhance learning efficiency, improve traffic flow, and optimize CAV trajectory control in mixed traffic environments.

Abstract

Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency than model-free RL by utilizing a virtual environment model. However, obtaining sufficiently accurate representations of environmental dynamics is challenging because of uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performance of model-based RL. Furthermore, while model-based RL can improve sample efficiency, it often still requires substantial training time to learn from scratch, potentially limiting its advantages over model-free approaches. To address these challenges, this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero. Our approach integrates traffic expert knowledge into a virtual environment model, employing the intelligent driver model (IDM) for basic dynamics and neural networks for residual dynamics, thus ensuring adaptability to complex scenarios. We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch. The proposed approach is applied to connected automated vehicle (CAV) trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flows. The experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared with the baseline agents in terms of sample efficiency, traffic flow smoothness and traffic mobility.

Motivation

Traditional RL for CAVs learns from scratch, leading to slow convergence and low sample efficiency. While traffic expertise provides a strong foundation, it lacks adaptability to complex, dynamic traffic conditions. Our approach integrates traffic expertise as a baseline, enabling the RL agent to start from a higher performance level and learn a residual policy on top. As shown in the figure, this accelerates learning, improves efficiency, and enhances CAV control in complex traffic scenarios.

Visualization

This visualization illustrates three traffic scenarios: Ring Network, Figure 8 Network, and Merge Network, to evaluate the impact of Connected Automated Vehicles (CAVs) on mixed traffic flow. The top row shows baseline conditions with only Human-Driven Vehicles (HDVs), where stop-and-go waves and congestion can emerge. The bottom row introduces CAVs, demonstrating their influence in smoothing traffic flow and reducing disruptions. The Ring Network and Figure 8 Network highlight how a single CAV can regulate surrounding traffic, while the Merge Network explores the effect of CAVs on highway merging efficiency.

Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control

TL;DR: We propose a knowledge-informed model-based residual reinforcement learning framework that integrates traffic expert knowledge with neural networks to enhance learning efficiency, improve traffic flow, and optimize CAV trajectory control in mixed traffic environments.

Abstract

Motivation

Visualization

Poster

BibTeX