
Engineering and Technology
Uncovering drone intentions using control physics informed machine learning
A. Perrusquía, W. Guo, et al.
Discover how the innovative CPhy-ML framework, developed by Adolfo Perrusquía, Weisi Guo, Benjamin Fraser, and Zhuangkun Wei, revolutionizes drone intention inference using a blend of deep learning and aerospace models. This groundbreaking research significantly enhances trajectory prediction and reward function inference, making strides in reliability and accuracy.
~3 min • Beginner • English
Introduction
The paper addresses the challenge of inferring the hidden intention of uncooperative drones, which cannot be directly observed by sensing systems. Misclassification leads to false positives or over-trusting autonomous systems around critical infrastructure. Prior approaches often rely on expert-defined low-dimensional features or purely data-driven trajectory prediction, both susceptible to cognitive, observational, and learning biases and lacking explicit use of flight physics. The authors propose a control-physics informed machine learning (CPhy-ML) framework that fuses deep learning with aerospace control and physics to regularize learning, reduce bias, and improve robustness in inferring two complementary notions of intention: trajectory intention (mission profile class and future trajectory bounds) and reward intention (the hidden objective underlying the control strategy). The aim is to improve reliability of intention inference, anomaly detection, and explainability via reward function recovery.
Literature Review
The paper reviews expert-knowledge intent classification methods that rely on geofencing, traffic rules, or simple flight constraints, which struggle to scale to complex behaviors. It contrasts these with intention inference methods for trajectory prediction that cluster attributes and predict future paths using snapshot data, often omitting continuous flight physics. Physics-informed learning has shown benefits via loss regularization or conservation-law-based architectures, yet a gap remains in uncovering hidden intention and complex drone capabilities by integrating control physics with data-driven models.
Methodology
CPhy-ML integrates data-driven models with control and flight physics across three tasks: (1) Multi-expert prediction and anomaly detection, (2) Trajectory intention inference, and (3) Reward intention inference.
- Data and feature preparation: Open-access telemetry datasets and custom real-world flights are used to generate heterogeneous synthetic radar tracks (Stone Soup-based simulator with EKF; process noise and radar location augmentation). RF sensor data with state and control inputs (Euler angles and thrust) is also collected. Trajectories are segmented into sub-trajectories using time windows (8, 16, 32, 64 s). Features include Sub-trajectory Features (normalized time series) and Summary Features (mean, std, min, max per feature). Splits: 75% train, 15% validation, 10% test.
- Hybrid intention classifier and novelty detector: A convolutional bidirectional LSTM with attention (CBLSTMA) serves as the classifier (encoder: 1D CNN + bi-LSTM; classifier: attention + dense softmax). Regularization uses dropout and early stopping. A deep LSTM autoencoder shares the classifier's encoder layers and provides reconstruction error for novelty detection. A composite loss (weighted categorical cross-entropy and MSE with alpha=0.95) balances class prediction and anomaly detection.
- Deep mixture of experts (DMoE) for trajectory intention regression: To predict future airspace occupancy (bounding boxes), a DMoE with m multi-input CNN experts is used. Each expert ingests Sub-trajectory sequences via 1D CNN and Summary Features via DNN; embeddings are concatenated for regression. The experts’ outputs are weighted by the hybrid classifier’s softmax probabilities to produce class-conditional bounding boxes. Huber loss is used for training.
- Trajectory intention prediction for anomalous trajectories: For unseen or anomalous profiles, reservoir computing (RC) predicts future trajectories with a linear readout. A physics-informed feedback augments reservoir weights (PIRC) to stabilize and constrain predictions using control physics-derived feedback, enhancing robustness across prediction horizons.
- Linear drone modeling for noise suppression and control context: Dynamic Mode Decomposition with control (DMDc) learns linear models from RF data using Euler angles and thrust as inputs. A discrete LQR controller closes the loop to track references and produce noise-suppressed state estimates, enabling consistent small-angle conditions and aiding downstream prediction.
- Reward intention inference: Intention is defined as a surjective mapping from reward-function space to state-action behavior. The reward is modeled as a quadratic form in states and inputs (LQR structure). An off-policy, model-based reward-shaping inverse reinforcement learning algorithm is proposed to infer both state and control weight matrices (Q and R) by integrating reward feedback to constrain the learning manifold. Performance is evaluated via root mean squared spectral norm error and convergence of gains and weights.
- Experimental setup: Hardware includes a custom quadcopter with BeagleBone Blue, KV-8816 motors, 14.8V LiPo battery, tracked by a 25-camera VICON system at 120 Hz. Datasets used: UAV Attack, ALFA, ICMCIS drone tracking, package delivery UAV data, and authors’ custom data. Hyperparameters and additional details are provided in supplementary materials.
Key Findings
- Overall gains: CPhy-ML improves trajectory prediction performance by 48.28% over traditional methods and substantially enhances reward inference accuracy versus conventional IRL (RMSSNE reduction from 3.3747 to 0.3229).
- Trajectory intention classification and novelty detection: The CBLSTMA achieved 97.95% test accuracy; the hybrid classifier with integrated novelty detector achieved 97.75% test accuracy (precision 0.9735, recall 0.9822, F1 0.9777) with reconstruction MSE 0.4511, enabling detection of unseen profiles. Models with attention (CBLSTMA, CNNA) outperformed those without attention. Prediction times across RNN-based classifiers were on the order of 3.35e-5 s per sample.
- Trajectory intention regression (future airspace bounds): The DMoE achieved the best performance with R^2 = 0.7482 on the test set (RMSE 70.2524, MAE 28.0174), outperforming single multi-input CNN (R^2 = 0.3206) and other baselines. Training time for DMoE was 655.25 s; prediction time 6.19e-5 s.
- Trajectory intention prediction with physics-informed RC: Across horizons (1, 10, 100, 1000 steps), the physics-informed RC (PIRC) improved average test MSE from 4.5212 (linear RC) to 1.6471, with notable gains at long horizon (1000 steps test MSE 5.9048 vs 17.3744). For short horizons, PIRC matched or modestly improved over linear RC (e.g., 10 steps test MSE 0.1481 vs 0.1475).
- Linear DMDc + LQR modeling: Closed-loop DMD-LQR produced accurate, noise-suppressed state estimates for both periodic and non-periodic trajectories with average position-axis MSE around 0.063 (periodic) and 0.058 (non-periodic).
- Reward inference (IRL): The proposed reward-shaping IRL converged stably to near-true Q and R, outperforming gradient and model-based IRL baselines, including for non-diagonal weight structures. Reported RMSSNE reductions include from 2.3910 and 1.9802 to 0.1942 (diagonal) and from 6.2848 and 3.3747 to 0.3229 (non-diagonal). The method simultaneously inferred Q and R and yielded fast convergence of control gains and kernel matrix.
- Anomaly indicators from reward values: High reward values flagged misbehavior in scenarios with large tracking errors or violations of small-angle conditions. Drag forces attenuated effects at high velocity, affecting reward magnitude patterns.
Discussion
By integrating deep learning with control-theoretic and physics-informed models, CPhy-ML addresses the core challenge of intention being latent and unobservable. The hybrid classifier captures high-dimensional spatiotemporal patterns while a novelty detector flags unseen behaviors, supporting counter-drone operations by reducing both false positives and over-trust. The DMoE decomposes regression by intention class, improving future airspace occupancy estimates and providing interpretable bounding boxes that are actionable for airspace management. For trajectory prediction, physics-informed RC stabilizes long-horizon forecasts, counteracting divergence common in purely data-driven RC and enabling control-relevant, noise-reduced predictions. The DMDc + LQR loop further denoises telemetry and imposes small-angle conditions, facilitating consistent modeling across profiles. Crucially, the reward-shaping IRL uncovers the hidden control objective (Q, R), offering causal explanations of behavior and a principled anomaly indicator across mission profiles. Together, these findings demonstrate that coupling data with control physics reduces learning bias, increases robustness, and yields explainable intention inference suitable for real-time counter-UAV scenarios.
Conclusion
The study introduces CPhy-ML, a unified framework that fuses deep neural models with control physics to infer drone intentions from trajectories and control signals. Contributions include: (1) a hybrid attention-based classifier with integrated novelty detection for intention classes; (2) a deep mixture of experts to regress future airspace occupancy bounds; (3) a physics-informed reservoir computing predictor that stabilizes long-horizon forecasts; (4) DMDc + LQR for linearized, noise-suppressed state estimation; and (5) an off-policy, model-based reward-shaping IRL that reliably recovers hidden reward functions. Empirical results show significant improvements in classification accuracy, trajectory prediction robustness, and reward inference fidelity over baselines. Future research directions include expanding data richness and class diversity to improve generalization, optimizing expert sharing across classes to reduce latency, enhancing robustness of IRL-based explanations under noise and varying mission profiles, and integrating reward values as constraints within learning to further regularize and explain model behavior.
Limitations
- Data richness and variability: Performance depends on heterogeneous, rich datasets. Low variability impairs generalization of classifiers and regressors, causing mislocated bounding boxes. Synthetic data had limited variation; expanding intention classes and building an intention dictionary is advised.
- Scalability and latency: DMoE prediction time increases with the number of intention classes; more experts increase latency unless computational resources are distributed or experts are generalized across classes.
- Reservoir computing generalization: RC-based prediction benefits from rich excitation; otherwise, performance degrades. Remedies include ensuring richer training data or using mixtures of RCs with diverse reservoirs to improve heterogeneity and generalization.
- Control input usage limits: Incorporating control inputs aids short-term predictions or constant references but is sensitive to noise, which can compromise policy and reward inference. Additional state estimation and parameter identification, as well as IRL methods leveraging MPC and experience inference, are recommended.
- Reward signal availability: Using reward values to constrain learning is promising but such measurements are not standard onboard outputs, limiting practical deployment without additional instrumentation or estimation.
- Linear modeling challenges: DMDc model quality is sensitive to trajectory richness and small-angle assumptions; LQR design may require expert tuning and may not capture highly nonlinear regimes without sufficient excitation.
Related Publications
Explore these studies to deepen your understanding of the subject.