Engineering and Technology

A platform-agnostic deep reinforcement learning framework for effective Sim2Real transfer towards autonomous driving

D. Li and O. Okhrin

Discover a groundbreaking Deep Reinforcement Learning framework for autonomous driving developed by Dianzhao Li and Ostap Okhrin that conquers the Sim2Real transfer challenge. This innovative approach enhances lane-following and overtaking capabilities using simulated environments, ensuring seamless performance in the real world.

00:00

Playback language: English

Index

Introduction

Autonomous driving faces significant hurdles, primarily the difficulty of transferring agents trained in simulation to real-world environments. The discrepancies between simulated and real-world conditions, known as the Sim2Real gap, hinder the direct application of Deep Reinforcement Learning (DRL) agents trained solely in simulation. DRL, while showing promise in solving complex tasks, requires substantial interaction with the environment for effective policy learning. Training directly in the real world is impractical due to safety concerns and the vast number of interactions needed. Fully autonomous driving encompasses multiple driving tasks, including car following, lane keeping, overtaking, collision avoidance, and traffic sign recognition. While DRL algorithms have been applied to these individual tasks, combining them and ensuring seamless Sim2Real transfer remains a significant research challenge. Existing Sim2Real techniques, such as domain randomization, domain adaptation, and robust RL, often address single tasks. This research addresses the gap by proposing a framework capable of handling lane following and overtaking, and transferring the learned policy to the real world. The focus on a combined lane-following and overtaking task distinguishes this work from existing research, making it a significant contribution to the field.

Literature Review

The authors review existing literature on DRL applications in autonomous driving, highlighting successful applications to individual tasks such as car following, lane keeping, lane changing, overtaking, and collision avoidance. They note the increasing use of DRL in transportation scenarios but emphasize the limitations of directly applying simulation-trained agents to real-world environments. The Sim2Real gap is discussed, and existing techniques to mitigate this gap, including domain randomization, domain adaptation, knowledge distillation, meta-reinforcement learning, and robust RL, are reviewed. The authors identify a research gap: the lack of studies addressing the combined tasks of lane following and overtaking, especially concerning Sim2Real transfer. This gap motivates their proposed framework.

Methodology

The proposed framework consists of two key modules: a platform-dependent perception module and a universal DRL control module. The perception module serves as an abstraction layer, handling the heterogeneity between different platforms. It extracts task-relevant information (affordances) from the traffic state, such as lane markings and the positions of other vehicles. These affordances form the input state for the DRL control module. The control module uses an LSTM-based DRL algorithm (LSTM-TD3 or LSTM-SAC), chosen for its effectiveness in handling time-dependent decision-making tasks. The agent is first trained in a simple Gazebo simulator, then validated in the Gym-Duckietown environment, and finally tested in a real-world setting. The separation of the perception and control modules enhances generalization. The perception module undergoes a multi-step image processing pipeline: illumination compensation using k-means clustering and affine transformation, edge detection using a Canny filter, line segmentation using a probabilistic Hough transform, and finally, estimation of lateral displacement and angle offset using a non-parametric histogram filter. For control, a PID controller is used as a baseline for comparison in simulation, using precise information directly from the simulator. In the real world, a PI controller is used as a baseline. For comparison, two additional DRL baselines are created: an end-to-end (E2E) DRL agent that directly processes camera images and a CNN-based DRL agent with a similar modular structure to the proposed framework but using a CNN for perception. A human baseline is also established through keyboard control in Gym-Duckietown. The reward function for the DRL agent balances driving efficiency, lane following, and overtaking capability. The agent receives a penalty for collisions or going off-road, and positive rewards for maintaining the lane, driving efficiently, and successful overtaking maneuvers. The TD3, SAC, LSTM-TD3, and LSTM-SAC algorithms are used for training, with the latter two showing superior performance. Training is conducted using ROS and Gazebo, with validation in Gym-Duckietown and real-world testing on Duckiebots (DB21 and DB19).

Key Findings

The proposed DRL agent significantly outperforms both the PID and human baselines in simulation, achieving higher speeds and fewer deviations and infractions. In the Gym-Duckietown environment, it demonstrates superior performance across five different maps, showcasing its robustness. The overtaking success rate exceeds 90% in simulation. In real-world experiments, the proposed agent surpasses the PID baseline in terms of speed and fewer infractions, consistently driving without human intervention. The agent successfully performs overtaking maneuvers in real-world scenarios involving both static and dynamic obstacles. The proposed framework demonstrates superior Sim2Real transfer compared to the E2E and CNN DRL baselines, which struggle to adapt to real-world variations such as changes in lane color and width. The Fréchet Inception Distance (FID) score of 198.82 between simulated and real-world images highlights the significant appearance gap that the proposed method effectively bridges. Despite inaccuracies in the perception module, evidenced by a root mean square error ranging from 0.046m to 0.067m for lateral and orientation deviations, the DRL agent maintains superior performance.

Discussion

The results demonstrate the effectiveness of the proposed platform-agnostic DRL framework in achieving successful Sim2Real transfer for autonomous driving. The separation of the perception and control modules, coupled with the use of LSTM-based DRL, enables the agent to generalize well across different platforms and handle variations in real-world conditions. The superior performance compared to both classical control methods (PID) and human baselines highlights the potential of this approach for robust autonomous driving. The framework's ability to handle both lane following and overtaking, two crucial driving tasks, makes it a significant advancement in the field. The findings underscore the importance of considering both appearance and content gaps when assessing Sim2Real transfer, and the proposed method effectively addresses both.

Conclusion

This paper presents a novel DRL framework for autonomous driving that successfully addresses the Sim2Real transfer problem. The framework's modular design, leveraging a platform-dependent perception module and a universal DRL control module, enables efficient transfer between simulation and real-world environments. The superior performance achieved in both simulation and real-world scenarios, along with the ability to handle lane following and overtaking tasks, demonstrates the framework's effectiveness. Future work could focus on incorporating more sophisticated perception modules to handle diverse real-world conditions, modeling vehicle dynamics more accurately, and incorporating multi-agent reinforcement learning for cooperative overtaking behaviors.

Limitations

The current framework treats vehicles as black boxes, not accounting for dynamic differences between various robotic platforms. The perception module's output can be noisy, potentially affecting the DRL agent's performance. The overtaking scenario in the real world uses a slower vehicle driven by a constant velocity PID controller, which is a simplification of real-world traffic conditions. The evaluation of the perception module is limited to the Gym-Duckietown environment due to the unavailability of ground truth data in real-world scenarios.

Related Publications

Explore these studies to deepen your understanding of the subject.

Physics

Realizing a deep reinforcement learning agent for real-time quantum feedback

K. Reuer, J. Landgraf, et al.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Medicine and Health

AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery

Y. Xu, S. Ma, et al.

Engineering and Technology

SRS-Net: a universal framework for solving stimulated Raman scattering in nonlinear fiber-optic systems by physics-informed deep learning

Y. Song, M. Zhang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny