Computer Science
Emergent behaviour and neural dynamics in artificial agents tracking odour plumes
S. H. Singh, F. V. Breugel, et al.
Locating the source of an odour in a windy environment is a challenging control problem due to intermittent odour signals, changing wind direction, and variability in plume shape. Insects solve this by integrating current and past egocentric odour, visual, and wind signals to determine actions, implying a role for memory. While wind tunnel, virtual reality, and outdoor experiments have characterized behaviour, collecting substantial neural data during free flight is difficult. The study poses a normative, complementary in silico question: can artificial recurrent neural network (RNN) agents trained via deep reinforcement learning (DRL) track dynamic, turbulent-like odour plumes, and what behavioural strategies, memory demands, and neural dynamics emerge? The purpose is to develop an integrated understanding that bridges behaviour and neural computations, to generate testable hypotheses (e.g., centreline versus wind-direction tracking under non-stationary winds), and to provide intuition for memory requirements in odour plume tracking.
Recent DRL-based models have been used to probe neural function and behaviour in domains such as motor control, time encoding, reward-based learning, meta-learning, and abstract task representations. Closest related works include: Merel et al. who trained a virtual rodent with a deep ANN to solve tasks and found emergent behavioural and neural similarities to rodents; Reddy et al. who used RL agents on static odour trails and showed zig-zagging strategies in terrestrial tracking with single or dual sensors; and Rapp & Nawrot who modelled a biologically detailed spiking mushroom body controlling foraging in turbulent plumes (constant wind, with distractor odours). This study differs by using a more challenging, dynamic, and stochastic plume environment (including switching winds), simplifying neural architecture to vanilla RNNs to enable dynamical systems analyses, and omitting biomechanical and vision details to maintain tractability. The approach focuses on general principles of plume tracking, analysing emergent behaviours and network-level neural dynamics that are robust to architectural/hyperparameter variations.
Plume simulation: A particle-based 2D plume model simulates turbulent-like features: intermittency, rapid concentration fluctuations, Gaussian time-averaged cross-sections, meandering structure. Odour puffs are emitted as a Poisson process from a source at the origin, advected by a homogeneous wind field (0.5 m/s) with configurable direction, diffuse radially with concentration scaling inversely with radius, and receive Gaussian cross-wind perturbations. Arena bounds: x in [−2, +10] m, y in [−5, +5] m; simulated at 100 Hz. Four wind/plume configurations: (1) constant wind; (2) switch-once (single 45° anticlockwise change during episode); (3) switch-many (random changes approx. every 3 s, draws from N(0,45°) truncated at ±60°); (4) sparse (reduced puff birth rate 0.4×). Additional ‘sparser’ demonstrations also reduce puff radial diffusion to 0.5×. Agent architecture: Actor-critic networks with a vanilla RNN (64 tanh units) feeding two-layer MLP actor and critic heads. Inputs each 40 ms time step (25 Hz): egocentric wind velocities (x,y) and local odour concentration (3D vector). Continuous outputs: turn (±6.25 rad/s) and forward movement (≤2.5 m/s), matched to Drosophila capabilities. RNN recurrent weights initialized with normal distribution; feedforward orthogonal init. For comparison, feedforward MLPs receive fixed-length histories by appending past observations (2–12 steps) to the input, enabling controlled memory capacity analyses. Training: Proximal Policy Optimization (PPO) with curriculum and reward shaping that strongly rewards homing, mildly rewards radial distance reduction, and penalizes long trajectories and large plume strays. 14 seeds per architecture; select top 5 by total successes across configurations for analysis. Episodes run at 25 FPS, max 300 steps (12 s). Hyperparameters include LR 3e−4 (linear decay), entropy coef 0.05, value loss coef 0.5, PPO epochs 10, gamma 0.99, GAE lambda 0.95, max grad norm 0.5. Evaluation and analyses: Each trained agent evaluated on 240 episodes per configuration using varied initial positions/timestamps/headings. Behavioural modules segmented by time since last odour encounter into tracking, recovering, and lost (thresholds chosen by visual inspection). Course-direction distributions computed relative to current wind and local plume centreline to test centreline versus wind alignment. Neural activity dimensionality assessed via PCA on hidden states across configurations. Putative represented variables identified by visualizing neural trajectories coloured by: head direction, steps since last plume encounter (T_last), exponentially weighted moving average (EWMA) of odour concentration (Odour_EWMA), and EWMA of discretized odour encounters (Odour_ENC). Window sizes for EWMA determined via linear regression of neural activity onto features across sliding windows, selecting window with peak R². Importance for action selection quantified by training a random forest classifier on discretized actions (6 classes from turn×move bins) using instantaneous sensory inputs and represented variables; permutation importance (N=30) used for feature ranking. Neural dynamics examined via projections onto leading PCs per episode; gradient flow visualized; structures (limit cycles, funnel-like) identified. Connectivity analysis used eigenvalue spectra of recurrence matrix W_r before vs after training; recurrence Jacobians along trajectories yielded stimulus integration timescales τ_i = 1/|ln|λ_i|| for stable modes; compared trained vs untrained RNNs. Performance comparison between RNNs and MLPs across tasks as a function of MLP history length.
Behavioural modules: Agents exhibit three modules tied to time since last odour: tracking (frequent odour <0.5 s, rapid upwind approach or ‘plume skimming’ along edges), recovering (irregular large often cross-wind movements after short loss ~0.5 s), and lost (periodic spiralling or oscillating motion after longer loss >1 s). Thresholds for lost onset across agents: 25–38 steps (1.0–1.52 s).
Centreline vs wind tracking: In switch-once and switch-many plumes, empirical course-direction distributions align with the plume centreline (antiparallel, ≈±180°) rather than the instantaneous wind. For switch-once, flights average ~45° off current wind but remain aligned with the centreline; similar trends hold for switch-many. This holds across all five analysed RNN agents.
Neural representations and dimensionality: Population activity is low-dimensional: the first 5–8 PCs explain ~90% of variance (64-D hidden state). The RNN encodes task-relevant variables: head direction; time since last encounter (T_last); Odour_EWMA and Odour_ENC. Optimal EWMA windows (median/typical): Odour_EWMA ~6–12 steps (~0.24–0.48 s; average ~0.3 s) with high linear encoding quality (R² ≈ 0.86–0.92; mean ~0.91). Odour_ENC ~40–62 steps (~1.6–2.5 s; average ~1.9 s) with moderate encoding (R² ≈ 0.51–0.71; mean ~0.59).
Action prediction importance: Including represented variables boosts random-forest action-classifier accuracy by 10–18% over instantaneous sensory-only and by 26–51% over majority-class baselines. T_last is consistently among the top two features (near wx), and time-averaged odour features are more informative than instantaneous odour; Odour_EWMA generally exceeds Odour_ENC in importance in 4/5 agents.
Neural dynamics regimes: Lost behaviour corresponds to quasi-periodic limit cycles in neural state space; tracking corresponds to quasiperiodic funnel-like structures; recovering occupies an amorphous transition region. These regimes recur across 4/5 agents.
Connectivity and memory: Training pushes multiple eigenvalues of W_r outside the unit circle, including at least one real eigenvalue >1 in all agents, indicating unstable modes that, together with inputs, drive dynamics. Trained RNN stimulus-integration timescales mostly lie within ~12 steps (0.5 s), well within episode length (300 steps), with top timescales for an example agent at 56.5, 13.0, 7.7, 6.8, 5.8 steps. Untrained RNNs can exhibit very long timescales exceeding episode duration.
Memory and performance: RNNs outperform MLPs across all plume tasks; MLP performance improves with longer appended sensory histories, especially in challenging settings (switching winds, sparser odour), demonstrating the importance of memory. Overall, short-timescale memories (<0.5 s) suffice for constant-wind plumes, whereas longer memory improves performance in non-stationary winds.
The study shows that DRL-trained RNN agents can solve dynamic odour plume tracking using emergent behavioural modules and internal representations analogous to those reported in insects. The decomposition into tracking, recovering, and lost modules mirrors upwind surging, cross-wind casting, and U-turn/spiral reacquisition behaviours observed in moths and flies. The centreline-tracking finding addresses a key question for non-stationary environments: agents align with the local plume geometry rather than current wind, predicting that biological centreline tracking should be more apparent under switching winds. Neural population activity is low-dimensional and encodes head direction, time since last odour encounter, and time-averaged odour features across distinct timescales, offering a mechanistic account of how intermittent signals are integrated. Neural dynamics organize into overlapping regimes (limit cycles for lost, funnels for tracking), suggesting continuous, amorphous attractor-like structures rather than discrete fixed points for this task. Connectivity analyses reveal training-induced unstable modes and adjusted integration timescales that enable appropriate memory for plume tracking, explaining why RNNs outperform memory-limited feedforward agents, especially under changing winds. Together, these results provide a normative computational account linking behaviour, memory, and neural dynamics in odour-guided navigation.
By training vanilla RNN actor-critic agents with PPO to track simulated turbulent-like odour plumes, the study uncovers emergent insect-like behavioural modules, testable centreline-tracking strategies under changing winds, low-dimensional neural dynamics, and internal representations of head direction, odour encounter timing, and time-averaged odour signals. Neural dynamics exhibit regime-specific geometries (quasi-limit cycles and funnels), and training sculpts recurrent connectivity to set appropriate stimulus-integration timescales, emphasizing the role of memory, particularly in non-stationary environments. These results offer hypotheses for biological plume tracking, provide reverse-engineering insights for artificial and biological networks, and may inform design of olfactory robots. Future work should incorporate more realistic plume physics, biomechanical bodies, spiking and biologically constrained architectures, multimodal sensing, and new theoretical tools for analysing continuous dynamical regimes in actor-critic RNNs.
The plume simulator, while efficient and sufficiently realistic for behavioural training, is approximate and omits features such as fully filamentous structure and systematic variations of whiff statistics with distance. Agents use vanilla RNN units without biomechanical body models, vision, or spiking neurons; adding biological constraints (e.g., excitation–inhibition balance, Dale’s law, connectome-informed wiring) and multiple sensors (e.g., dual antennae) could change strategies. Analyses rely on dimensionality reduction and proxies for represented variables; alternative latent variables may better explain behaviour. Performance varies across seeds, suggesting sensitivity to training curricula/hyperparameters. Existing fixed-point-based reverse-engineering tools are ill-suited to the continuous, amorphous dynamical structures observed, motivating development of new analysis methods. Finally, generalization to fully 3D environments and richer tasks remains to be established.
Related Publications
Explore these studies to deepen your understanding of the subject.

