logo
ResearchBunny Logo
Introduction
Locating odour sources in windy environments is a challenging control problem requiring agents to correct course amidst intermittent signals, changing wind directions, and variable plume shapes. Flying insects excel at this task, navigating long distances to locate food or mates. While experimental studies have explored aspects of this behaviour and its neural circuitry, this research takes a complementary in silico approach. The goal is to develop an integrated understanding of insect odour plume tracking behaviour and the underlying neural computations. This approach offers advantages over traditional wind tunnel experiments, which are expensive and time-consuming, especially for generating controlled dynamic plumes and recording high-resolution flight trajectories. Artificial neural networks (ANNs), particularly those trained using deep reinforcement learning (DRL), provide a powerful tool for modelling animal behaviour and neural function. DRL allows ANN agents to learn optimal strategies by receiving sensory observations and rewards based on their actions, maximizing total expected reward. This normative approach provides insights into how a neural system *should* behave, complementing descriptive experimental observations. The study utilizes DRL to train RNN agents to track simulated odour plumes mimicking real-world features, aiming for an integrated understanding of both the behavioural strategies and associated neural computations.
Literature Review
Several recent studies have employed DRL to train ANNs for tasks inspired by neuroscience. These include modelling motor cortex dynamics, hippocampal time encoding, prefrontal cortex reward-based learning, and task-associated representations across multiple brain areas. The current work builds upon these efforts, but distinguishes itself in several key ways. Unlike previous research focusing on static trail tracking or using constant-wind-direction plumes with distractor odours, this study simulates a more challenging dynamic and stochastic odour environment. While previous work utilized gated RNNs or spiking neurons, this research employs simpler 'vanilla' RNNs, facilitating dynamical systems analysis. This approach allows for a focus on general principles of plume tracking, abstracting from biomechanical details and connectivity constraints. The simplification also enables training on a computationally accessible budget. By omitting vision and joint-level motor control, the neural networks are simpler, enabling more efficient training and analysis.
Methodology
A particle-based two-dimensional plume model, computationally tractable and approximating real-world plume characteristics (intermittency, concentration fluctuations, Gaussian cross-section), is employed. Actor-critic neural networks receive continuous sensory inputs (egocentric wind velocity and odour concentration) and produce continuous move and turn actions, parameters roughly matched to fly capabilities. Proximal Policy Optimization (PPO), a robust DRL algorithm, trains the agents. Agents are initialized at random locations and trained on plumes switching directions multiple times per episode. Evaluation assesses trained agents on four wind configurations: constant, switch-once, switch-many, and sparse (reduced puff birth rate). Additional 'sparser' plumes (reduced birth and diffusion rates) test performance under highly intermittent odour conditions. Behavioural analysis decomposes trajectories into three modules: tracking, lost, and recovering, defined by time elapsed since last odour encounter. Neural dynamics analysis focuses on population activity, reducing dimensionality via principal component analysis (PCA) to identify task-relevant variables represented in the low-dimensional neural activity. A random forest classifier predicts agent actions using instantaneous sensory observations and identified represented variables, assessing their importance through permutation importance scores. Finally, the study investigates RNN connectivity, examining eigenvalue spectra of the recurrence matrix and stimulus integration timescales, comparing RNN performance to feedforward MLP networks with varying sensory history lengths.
Key Findings
Trained RNN agents successfully localized odour sources across varying wind conditions and plume sparsity. Emergent behaviours included tracking (straight-line or plume skimming), lost (spiralling or oscillating), and recovering (large cross-wind movements). Successful tracking in switching-wind plumes revealed agents track the plume centreline, not just current wind direction. PCA revealed low-dimensional neural activity representing task-relevant variables beyond instantaneous sensory inputs, including head direction, time since last encounter, exponentially weighted moving average (EWMA) of odour concentration, and EWMA of odour encounters. These represented variables significantly improved action prediction accuracy in a random forest classifier compared to using only instantaneous sensory observations. Neural dynamics exhibited structured regimes: funnel-like structures for tracking, and quasi-periodic limit cycles for lost behaviour. RNN connectivity analysis showed training introduced unstable eigenvalues, driving network dynamics. Stimulus integration timescales were predominantly short (under 0.5s), suggesting short-timescale memory suffices for constant-wind plumes, while longer-term memory is crucial for switching-wind plumes. RNNs consistently outperformed MLPs, highlighting the importance of recurrence and internal memory for complex plume tracking tasks.
Discussion
The findings demonstrate that the trained RNN agents exhibit behavioural and neural features remarkably similar to those observed in flying insects. The decomposition of behaviour into distinct modules mirrors upwind surging, cross-wind casting, and U-turn behaviours. The centreline tracking strategy challenges previous models and proposes a testable hypothesis: centreline tracking will be more prominent in insects tracking plumes in switching-wind conditions. The represented variables, including head direction, time since last encounter, and EWMA of odour cues, align with known biological mechanisms. The low-dimensional neural dynamics and structured regimes (funnel and limit cycle) corroborate neurobiological observations. The analysis of memory requirements supports the idea that short-term memory suffices for simpler tasks, while complex scenarios necessitate longer-term memory. These results advance the understanding of both biological and artificial odour navigation.
Conclusion
This research demonstrates that DRL-trained RNN agents can effectively solve a stochastic plume tracking task, exhibiting biologically plausible behaviours and neural dynamics. The findings provide valuable insights into the computational mechanisms of odour plume tracking and suggest future experimental directions. Future work should explore more realistic plume simulations, incorporate biological constraints into network architecture (e.g., spiking networks, biomechanical models), investigate multitask learning, and develop more sophisticated analysis methods for characterizing continuous RNN dynamics.
Limitations
The plume simulator, while computationally efficient, is an approximation. It doesn't capture all aspects of real plumes, such as filamentous structures or variations in whiff duration and frequency. The use of vanilla RNNs and the absence of a biomechanical body model are simplifications. The observed performance variability across agents highlights the need for further investigation into training algorithms and curricula. Further methodological development is needed to better analyze continuous RNN dynamics.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny