Engineering and Technology

Wing-strain-based flight control of flapping-wing drones through reinforcement learning

T. Kim, I. Hong, et al.

Explore the future of drone technology as researchers, including Taewi Kim and Insic Hong from Ajou University, unveil a groundbreaking wing-strain-based flight controller for flapping-wing drones, enabling advanced flight data acquisition without traditional sensors. This innovation enhances gust resistance and allows for autonomous wind-assisted flight.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the challenge of robust control and environmental sensing in flapping-wing micro aerial vehicles (FWMAVs), which, despite their agility and efficiency, are difficult to control under unsteady aerodynamics and in windy conditions. Conventional approaches rely on IMUs, cameras, and model-based controllers, which can be inadequate due to delayed disturbance detection, sensor rigidity, and complex, flexible-wing aerodynamics that defy simple tuning. Biological inspiration comes from insects whose wings host campaniform sensilla that sense strain and encode aeroelastic information, potentially including inertial and Coriolis effects, enabling immediate proprioceptive control. The authors hypothesize that wing strain alone can provide sufficient state information (attitude, airflow) for agile control and propose a reinforcement-learning (RL) based fly-by-feel controller that uses wing-base strain sensors to directly map sensory inputs to control actions, tested across five progressively complex experimental paradigms.

Literature Review

Prior work on FWMAVs (e.g., RoboBee, DelFly, Purdue Hummingbird) demonstrates impressive agility but highlights control and lightweight design challenges. Standard feedback systems using IMUs and vision suffer from delays and rigidity mismatches with flexible platforms. Biological literature shows insects use campaniform sensilla to detect aeroelastic loads and possibly inertial effects; insects maintain stability and navigate long distances by integrating mechanosensory, visual, and olfactory cues. Engineering attempts to use wing deformation for control exist for fixed/rigid wings, but success on flapping drones is limited due to highly nonlinear, coupled aeroelastic effects. Recent computational models of flapping aerodynamics are resource-intensive and often impractical for real-time control, motivating data-driven control strategies. This work builds on these insights by deploying ultrasensitive crack-based strain sensors and end-to-end RL for real-time control without traditional state sensors.

Methodology

Sensors and platform: Two ultrasensitive crack-based strain sensors (metal on 7.5 µm polyimide; ~3 mg each; gauge factor ~30,000) were attached at the wing bases of a commercial flapping drone (MetaFly; 10 g; 13.5 cm wingspan). Nanoscale cracks in the metal film provide high sensitivity; sensors minimally affect wing dynamics. The drone uses two motors: body (thrust) and tail (direction). Signal acquisition employed bandpass filtering (0.1–55 Hz) and normalization. State extraction and learning: Wing strain encodes aeroelastic features. A temporal 1D CNN processes sensor histories to extract state, feeding an RL controller. Control is formulated as a POMDP; observations are 1D strain streams. Time is discretized at 0.05 s. The agent state s_t is the most recent 32 observations (1.6 s). Actions are continuous motor torques a_t ∈ [0,1] (duty cycles). Discount factor γ=0.98. Transitions are stored and sampled (minibatch size 64). The Soft Actor-Critic (SAC) algorithm is used for off-policy, entropy-regularized learning, optimizing reward plus entropy with temperature α; separate networks for policy π_θ and Q-functions. The policy/network uses 1D CNN layers (e.g., 32 filters, width 5, max pooling), flattened and embedded (e.g., 256-dim) to output Gaussian action parameters; Q-network concatenates action embeddings. Experimental phases: 1) State-informative validation: The drone’s orientation was systematically varied (combinations of yaw, pitch, roll) to create 62 wind directions; combined with three wind speeds (3, 3.5, 5–7 m/s as specified), yielding 186 wind vectors. Strain data were input to a 1D CNN for classification/regression of wind direction and speed. Errors and confusion matrices evaluated predictive performance. 2) 1 DOF control: The drone was mounted to a rotary encoder via a 20-cm arm to move in a circle (α from 0° to 350°). Constant winds of 3 m/s and 5 m/s were applied in different angular sectors; a torsion spring returned the drone to start when not flapping. Only strain and motor power were inputs; encoder angle was used solely in the reward. Reward peaked at α=180° (Gaussian around 180°). SAC trained the thrust control to reach/maintain target position under varying apparent wind. 3) 2 DOF control: The body was connected to rotation joint 1 (for α) and a pin joint 2 enabling pitch β from −30° to 60°. A 5 m/s wind induced drag and counter-torques. The agent (using one wing-base sensor) had to maintain an optimal horizontal pitch to advance counterclockwise and maximize reward. A falling state (β<0°) required stopping flapping to let ambient wind recover pitch. 4) Position control in windy environment: In a wind tunnel with asymmetric turbulent flow (visualized via smoke and optical flow), a tethered drone used two sensors and both motors for 3D movement while reward was computed from motion capture (not used as input). The agent learned to approach and stay near designated targets; comparisons were made with untrained policies. 5) Flight path control in windless environment: In a 6×8×4 m volume (operational 2×3×2 m due to tethers), with a launcher standardizing takeoff, the agent learned free-flight manoeuvres (left/right, zigzag, circular) and altitude regulation using only two strain sensors. Human demonstration episodes were added to the replay buffer to aid learning. Odometry from strain signals (0.5 s windows, model with dropout) was compared against motion capture ground truth to assess trajectory reconstruction.

Key Findings

- State inference from strain: Across 12,416 validation cases, mean absolute wind angle error was 29°, with 76% of errors within 0–33°. Mean speed error was 26%, with 63.5% within 0–30%. Classification of 186 wind cases achieved mean AUC ≈0.99 and mean accuracy ≈80%; accuracy was highest for frontal and lower-left/right wind sectors. - 1 DOF control: The SAC-controlled drone reached and maintained target angles (e.g., 180°, 270°) using only strain signals, adapting flapping power to varying apparent wind. It outperformed untrained baselines (minimum, maximum, random power) in accumulated reward and recovered target position after induced disturbances. - 2 DOF control: The trained agent recognized a transition to a falling state within ~0.1 s and reduced motor power to zero to recover optimal pitch, then resumed flapping to advance. It maintained balance under disturbances and achieved higher rewards over training. - Wind tunnel position control: Using two strain sensors without IMU/vision, the trained drone stayed closer to targets than untrained policies, with reduced positional variance, increasing scores, and decreasing policy entropy. It generalised to different target locations (left/right) despite flow-field and hardware variations. - Windless free flight: The agent executed zigzag and circular trajectories using only strain sensors, improving scores and reducing entropy vs manual control (~30% manual success). It coordinated thrust and directional motors to maintain higher or lower altitudes relative to takeoff. Strain-based odometry reconstructed trajectories with reduced MSE across sessions and exhibited good agreement with ground truth in representative trials.

Discussion

The results validate the central hypothesis that wing-base strain encodes sufficient state information—wind direction, speed, and vehicle attitude—to enable agile, adaptive control of flapping-wing drones without traditional sensors. By directly mapping strain histories to control actions through end-to-end RL, the system handles complex, unsteady aeroelastic interactions that are difficult to model explicitly. Successful execution of positioning, balancing, navigation, and path-following tasks in both windy and calm environments demonstrates the feasibility and robustness of fly-by-feel control. The approach reduces reliance on IMUs, cameras, and aerodynamic models, offering resilience when conventional sensing is compromised and enabling rapid adaptation to airflow changes, akin to biological flyers.

Conclusion

This work introduces a bio-inspired fly-by-feel control paradigm for flapping-wing drones using only two ultrasensitive wing-base strain sensors and an RL controller. Across five progressively challenging experiments, the system inferred airflow and attitude, achieved target holding under varying winds, stabilized pitch in a two-DOF setting, maintained position in complex wind fields, and executed free-flight trajectories in calm air, with odometry derived from strain. These contributions show that strain sensing can serve as a primary modality for real-time control in FWMAVs. Future work will explore multi-sensor configurations on wings to separate gravity- from wind/acceleration-induced deformation for advanced behaviours (hovering, wind-assisted flight) and investigate fusion with gyroscopes/accelerometers to enhance precision and stability while minimizing redundancy.

Limitations

- Miniaturization: Although sensors are light and thin, further miniaturization is needed for subgram platforms with ~3 mg wing masses. - Sensing separability: Distinguishing gravitational effects from wind and inertial accelerations using only two sensors is challenging; accurate gravity sensing may require multiple, spatially distributed sensors. - Experimental constraints: Several experiments used tethers, wind tunnels, and motion capture for reward computation, which may limit immediate generalization to untethered, outdoor, or fully onboard systems. - Hardware and flow variability: Performance differences across target locations suggest sensitivity to flow-field asymmetries and hardware limits. - Limited onboard sensing/compute: Current setup avoids IMUs/vision to prove feasibility; practical systems may need sensor fusion and onboard compute, which introduce integration challenges.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Rapid inverse design of metamaterials based on prescribed mechanical behavior through machine learning

C. S. Ha, D. Yao, et al.

Computer Science

Exploring optimal control of epidemic spread using reinforcement learning

A. Q. Ohi, M. F. Mridha, et al.

Engineering and Technology

Reinforcement Learning Based Topology Control for UAV Networks

T. Yoo, S. Lee, et al.

Transportation

Deep reinforcement learning for decision making of autonomous vehicle in non-lane-based traffic environments

Y. Fei, L. Xing, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny