
Engineering and Technology
Champion-level drone racing using deep reinforcement learning
E. Kaufmann, L. Bauersfeld, et al.
Discover how Swift, an impressive autonomous drone racing system developed by Elia Kaufmann and colleagues, achieved world-champion-level performance in head-to-head races against human champions. Experience the remarkable intersection of deep reinforcement learning and real-world data in overcoming the challenges of high-speed flight.
~3 min • Beginner • English
Introduction
The study investigates whether an autonomous quadrotor can achieve world-champion-level performance in FPV drone racing using only onboard sensing and computation, without external motion-capture systems. Prior advances in deep RL have surpassed human performance in simulated and board-game environments, but translating such success to physical, high-speed, adversarial settings remains a major challenge. FPV racing demands real-time decision-making under noisy and partial observations at speeds exceeding 100 km/h and accelerations multiple times gravity. The authors present Swift, a system that integrates a learned control policy with onboard perception to compete fairly against human champions on a professional track, aiming to close the sim-to-real gap and demonstrate champion-level performance in the real world.
Literature Review
Autonomous drone racing efforts date back to 2016 competitions, with progress in deep learning-based gate detection, sim-to-real transfer, and handling perception uncertainty. The 2019 AlphaPilot competition showcased state of the art, yet top autonomous systems took nearly twice as long as professional humans. More recent systems achieved expert-level performance but relied on external motion-capture for state estimation, an advantage unavailable to human pilots. In broader AI, deep RL has achieved superhuman performance in Atari, Go, chess, shogi, StarCraft II, Dota 2, and racing simulators like Gran Turismo, largely in simulated or fully observable environments. Traditional quadrotor racing methods using trajectory planning and MPC can perform well under idealized assumptions (perfect state, simplified dynamics) but degrade markedly with noisy perception and unmodeled dynamics, failing to reach competitive lap times versus Swift or human champions even with motion-capture support.
Methodology
System architecture: Swift comprises (1) an observation/perception policy and (2) a control policy. The perception stack includes a visual–inertial odometry (VIO) module that fuses camera images (30 Hz) with IMU data (200 Hz) to produce a metric state estimate (100 Hz). A convolutional neural network detects racing-gate corners at 30 Hz; detected corners are used with a known track map and a camera-resectioning algorithm to estimate the drone’s global pose relative to the track. A Kalman filter fuses the gate-based pose with the VIO estimate to yield a robust observed state at 100 Hz. The control policy is a two-layer multilayer perceptron (2×128 units) running at 100 Hz that outputs collective thrust and body rates—the same control interface used by human pilots.
RL training: The control policy is trained in simulation with model-free on-policy deep RL (proximal policy optimization). The reward combines progresstowards the center of the next gate with a perception objective that encourages keeping the next gate in the camera field of view, improving state-estimation reliability.
Sim-to-real transfer: Purely simulated training underperforms on hardware due to (1) dynamics mismatch and (2) noisy real-world perception/state estimates. The authors collect a small dataset on the physical system by flying the track with a policy trained in simulation while logging onboard observations alongside highly accurate ground-truth poses from a motion-capture system. Analysis shows perception residuals are predominantly stochastic, while dynamics residuals are largely deterministic. They fit non-parametric, data-driven residual models: Gaussian processes for perception residuals and k-nearest-neighbor regression for dynamics residual forces/torques. These residuals augment the simulator, and the policy is fine-tuned in this augmented environment to improve robustness to real-world sensing and dynamics.
Evaluation setup: Experiments were conducted on a professional track with seven square gates in a 30×30×8 m volume (75 m lap). Swift flew head-to-head against three champions (A. Vanover, T. Bitmatta, M. Schaepper). In each race, both drones start on a podium after an acoustic signal; the winner is the first to complete three consecutive laps, passing all gates in order. Time-trial data (single and three-lap heats) were also collected over a week of practice and during races. Ablations compare Swift’s components and evaluate traditional planning/MPC baselines under noisy conditions (details in extended data).
Key Findings
- Head-to-head outcomes: Swift won most races against each human pilot and achieved the fastest recorded race time. Aggregate results over 25 races: 15 wins, 10 losses, best time-to-finish 17.465 s (Swift). Per-opponent summaries reported: vs A. Vanover (9 races), vs T. Bitmatta (7), vs M. Schaepper (9), with Swift winning several races against each. Of Swift’s 10 losses, 40% were due to collisions with the opponent, 40% collisions with a gate, and 20% being slower.
- Time-trials (single-lap medians; n in parentheses): Swift 5.52 s (n=483), Vanover 5.76 s (n=331), Bitmatta 5.96 s (n=469), Schaepper 6.80 s (n=345). Three-lap medians: Swift 16.98 s (n=115), Vanover 17.38 s (n=221), Bitmatta 17.98 s (n=338), Schaepper 21.65 s (n=202).
- Race dynamics: Swift recorded the fastest race time, beating the best human time (Vanover) by ~0.5 s. Swift tends to be faster at starts and tight turns (e.g., Split-S), with an average 120 ms quicker takeoff reaction than humans and higher initial acceleration/speed into gate 1. In sharp turns, Swift follows tighter lines while maintaining speed, achieving higher average speed and shorter racing lines overall, and operating closer to actuation limits (thrust/power) across the race.
- Consistency: Swift exhibits lower mean and variance in lap times, persistently pushing for speed; human pilots adapt strategy, sometimes pacing slower when leading to reduce crash risk.
Discussion
The results demonstrate that a learned policy, trained in an augmented simulator with data-driven perception and dynamics residuals and using only onboard sensing, can attain and at times exceed world-champion human performance in real-world FPV racing. This directly addresses the long-standing challenge of achieving champion-level control in a physical, partially observed, high-speed setting without external infrastructure. Structural factors help explain performance differences: Swift benefits from IMU-based inertial cues (unavailable to remote human pilots) and lower sensorimotor latency (≈40 ms vs ≈220 ms for experts). Conversely, Swift’s 30 Hz camera imposes a disadvantage relative to human pilots’ 120 Hz video, potentially affecting reaction time. Behaviorally, Swift optimizes long-term rewards and can execute tight, globally optimal lines through complex maneuvers (e.g., Split-S), whereas humans tend to plan over shorter horizons and keep upcoming gates in view. The demonstrated approach highlights the promise of hybrid learning-based control grounded in realistic sensing/dynamics modeling for physical systems beyond racing drones.
Conclusion
The paper introduces Swift, an autonomous quadrotor racing system that integrates onboard perception with a deep RL control policy fine-tuned in an augmented, data-driven simulator. Swift achieves world-champion-level performance in real FPV racing without external state estimation, winning most head-to-head races against top human pilots and setting the fastest recorded race time on a professional track. This work establishes a milestone for autonomous mobile robotics and machine intelligence and suggests that hybrid learning-based approaches—combining simulation training with empirical modeling of real-world sensing and dynamics—can enable high-performance control in other physical domains, including autonomous ground vehicles, aircraft, and personal robots. Future research directions include opponent-aware racing strategies, recovery policies post-crash, broader robustness to environmental appearance and illumination changes, and generalized residual models transferable across tracks and platforms.
Limitations
- Robustness to crashes: Swift was not trained to recover after collisions; human pilots can often recover and continue.
- Environmental appearance sensitivity: The perception system assumes similar appearance to training; significant changes in illumination or visuals can cause failures. Broader training of the gate detector and observation residuals across diverse conditions is needed.
- Opponent unawareness and strategy: Swift does not adapt strategy to opponent status, always pushing for fastest expected completion, which can be suboptimal when leading or trailing.
- Sensor constraints: Camera refresh rate is 30 Hz (lower than humans’ 120 Hz), potentially limiting reaction; although overall system latency is lower, visual bandwidth may limit certain maneuvers.
- Dependence on environment- and platform-specific residuals: Residual models are identified from real-world data on the specific track and platform, which may limit immediate transfer to new settings without additional data collection.
Related Publications
Explore these studies to deepen your understanding of the subject.