
Engineering and Technology
Gait switching and targeted navigation of microswimmers via deep reinforcement learning
Z. Zou, Y. Liu, et al.
This exciting research led by Zonghao Zou, Yuexin Liu, Y.-N. Young, On Shun Pak, and Alan C. H. Tsang showcases how deep reinforcement learning empowers a model microswimmer to develop adaptive locomotory gaits for efficient navigation. The findings reveal its potential for complex fluid environments, promising innovative applications.
~3 min • Beginner • English
Introduction
The study addresses how to endow artificial microswimmers with adaptive, multimodal gait-switching strategies akin to biological microorganisms to achieve targeted navigation in complex, low-Reynolds-number environments. Biological swimmers switch between translation and rotation modes (e.g., run-and-tumble, reverse-and-flick, run-and-spin) to navigate. Designing such adaptive gaits for artificial swimmers is challenging due to constraints of Stokesian hydrodynamics and environmental perturbations; most existing designs rely on fixed gaits and external manual control. Advances in AI and reinforcement learning suggest data-driven approaches could enable autonomous navigation. The paper proposes a deep reinforcement learning framework that allows a reconfigurable three-sphere swimmer to learn locomotory gaits for steering, transitioning, and translating, enabling targeted navigation and complex path following, and examines robustness under background flows.
Literature Review
Foundational work by Purcell and subsequent models (e.g., Najafi-Golestanian three-sphere swimmer and circle swimmer) established how non-reciprocal shape changes produce net motion at low Reynolds number. Prior artificial microswimmers often have fixed gaits and require external control. Machine learning has been applied to active particles for navigation in flows, noise, and obstacles, typically with prescribed propulsion and limited actuation degrees. Recent efforts used learning for minimal swimmers to discover self-propulsion strokes and chemotactic behaviors. However, learning adaptive, multimodal gait switching in a reconfigurable swimmer with continuous action spaces and full hydrodynamic coupling has been less explored. This work builds on reinforcement learning (PPO) and Actor-Critic methods to extend from discrete to continuous action spaces, enabling richer gait evolution.
Methodology
Model: A planar three-sphere swimmer with sphere radius R and centers r_i (i=1,2,3) connected by two arms of variable lengths L1, L2 and orientations θ1, θ2; intermediate angle θ31 between arms. Hydrodynamics: Low-Reynolds-number Stokes flow with Oseen tensor G_ij relating forces and velocities; force- and torque-free constraints. Kinematics: Actuation rates for L1, L2 derived from relative velocities along arm directions; θ̇31 from perpendicular components and arm rotation rates θ̇1, θ̇2. Validity constraint to avoid near-field breakdown: 0.6L < L1,L2 < L and 2π/3 ≤ θ31 ≤ 4π/3. Non-dimensionalization uses L, V, and μLV scales.
Reinforcement learning: Actor-Critic neural networks trained with PPO. State s includes r1, L1, L2, θ1, θ2. Observation o = (L1, L2, θ31, cos θa, sin θa), where θa = θr − θs and swimmer orientation θs = arg(r − r1), r is centroid. Action a sets actuation for (L1, L2, θ31) each step Δt. Reward r_t is displacement of the centroid along the target direction θr. Training protocol: Episodes of Ns=150 steps; Ne episodes total, with randomized initial configuration and θr per episode to promote exploration. Discount factor γ=0.99. Actor outputs Gaussian policy πθ(a|o); Critic estimates value Vφ. Updates performed every 20 episodes using clipped PPO objective with Adam optimizer; entropy regularization encourages exploration. Performance tests: After training to various Ne, assess success rates over 100 trials for (i) Random target test (θr ∈ [0,2π)), (ii) Rotation test (|θa|=π/2), (iii) Translation test (θa=0); success if swimmer travels 5 units along target in ≤10,000 steps. Additional experiments: Path tracing of the word “SWIM” by sequentially providing 17 target points; new target assigned when within threshold 0.1L of current point. Robustness tests under background flow: Impose a rotlet flow u = γ ez × r / r^3 with strengths γ=0.15 and 1.5; initialize swimmers at re = −5 ex with initial orientations θ0 ∈ {−π/3, 0, π/3}; compare AI-powered swimmer against an untrained Najafi-Golestanian (NG) swimmer using fixed translation gait. Implementation details of PPO (trajectory collection, surrogate loss, advantage A_t = R_t − Vφ, clipping, epochs, minibatches) are provided in the Methods algorithms.
Key Findings
- Emergent multimodal gaits: The trained swimmer autonomously discovers and switches among three distinct locomotory gaits for targeted navigation: steering (dominant rotation with small translation by varying L2 and θ31 while L1 remains contracted), transition (simultaneous modulation of L1, L2, θ31 yielding concurrent rotation and translation for fine orientation adjustment), and translation (Najafi-Golestanian-like stroke with L1 and L2 cycling and θ31 ≈ π for maximal translation with minimal rotation). These gaits occupy distinct regions and cycles in the configuration space (L1, L2, θ31).
- Targeted navigation sequence: From arbitrary initial orientation, navigation proceeds through steering → transition → translation phases to align with and move along θr.
- Quantitative performance improvements with training: Over 100 trials, at Ne = 3×10^4, success rates ~90% for random target, rotation, and translation tests. At Ne = 9×10^4, translation achieves 100% success; rotation still improving. At Ne = 1.5×10^5, all three tests reach 100% success. For Ne much larger than 1.5×10^5, performance becomes non-monotonic with success fluctuating around ~95%.
- Episode length effect: With fixed total episodes Ne, varying Ns from 100 to 300 shows Ns = 150 best balances learning of rotation and translation. At Ns = 100, swimmer learns translation but not rotation, indicating rotation gaits are harder to learn and require more steps.
- Complex path tracing: Without explicit gait programming, the swimmer accurately traces the word “SWIM” by sequentially navigating to landmark points, autonomously switching among gaits; some corners negotiated without steering gaits, others require steering depending on angle acuity.
- Robustness to background flows: In a weak rotlet (γ=0.15), the AI-powered swimmer maintains motion towards +x regardless of initial orientation, whereas the NG swimmer’s trajectory is strongly deflected by the flow. In a strong rotlet (γ=1.5), the NG swimmer loses directional control; the AI-powered swimmer initially circulates but escapes and proceeds along +x with similar trajectories for different initial orientations by adopting transition gaits to constantly re-orient.
- Generalization: The trained policy, learned without flows, exhibits robustness when deployed in flow conditions, indicating adaptability of the learned multimodal strategy.
Discussion
The findings demonstrate that deep reinforcement learning can endow a reconfigurable microswimmer with adaptive gait-switching, enabling it to autonomously align and translate toward arbitrary targets under low-Reynolds-number constraints. By discovering steering, transition, and translation gaits and switching between them based on the target orientation error, the swimmer achieves reliable targeted navigation without explicit gait design or prior hydrodynamic knowledge. The strategy parallels biological run-and-tumble behavior and shows resilience to environmental perturbations, including significant rotational flows. The ability to robustly trace complex paths and counteract flow-induced disturbances underscores the potential of AI-guided microrobots for tasks such as targeted drug delivery and microsurgery in unpredictable media. The observed learning dynamics (e.g., greater difficulty of rotation gaits, dependence on episode length, and non-monotonicity with excessive training) provide insights for optimizing training protocols for microrobotic control policies.
Conclusion
This work introduces a deep reinforcement learning framework (Actor-Critic with PPO) that enables a low-Reynolds-number, three-sphere reconfigurable swimmer to self-learn multimodal locomotory gaits and perform autonomous targeted navigation and complex path tracing. The learned strategy comprises distinct steering, transition, and translation gaits, with robust gait switching that generalizes to flow-perturbed environments where it outperforms a fixed-gait Najafi-Golestanian swimmer. The approach avoids manual gait design and shows promise for developing smart, adaptive microswimmers for biomedical applications. Future directions include extending to fully three-dimensional navigation by adding out-of-plane degrees of freedom; applying the framework to other swimmer architectures; incorporating background flows during training to exploit environmental cues; systematically accounting for Brownian noise; and handling boundaries and obstacles to operate in realistic, complex environments.
Limitations
- The study demonstrates planar motion only; out-of-plane rotations and full 3D navigation are not implemented.
- Hydrodynamic modeling uses the far-field Oseen tensor and enforces separation constraints; near-field and lubrication effects are neglected.
- Training was performed without background flows; robustness tests use the trained policy zero-shot in flows but do not co-train with flows.
- Thermal (Brownian) fluctuations, boundaries, and obstacles are not included in training or primary evaluations.
- Performance degrades non-monotonically with excessive training episodes, indicating potential overfitting or instability in policy optimization.
- Rotation gaits require more training steps/episodes to learn effectively, suggesting sensitivity to hyperparameters such as episode length Ns.
- Results are simulation-based; no experimental validation is presented.
Related Publications
Explore these studies to deepen your understanding of the subject.