Biology
Learning interpretable dynamics of stochastic complex systems from experimental data
T. Gao, B. Barzel, et al.
Discover how the Langevin Graph Network Approach (LaGNA) revolutionizes the inference of stochastic differential equations from empirical data for complex networks. Developed by Ting-Ting Gao, Baruch Barzel, and Gang Yan, this innovative method outshines existing techniques, providing critical insights into bird flock dynamics and tau pathology in mice.
~3 min • Beginner • English
Introduction
Many biological, physical, and social systems exhibit nonlinear and stochastic behaviors driven by interactions on complex networks. Stochastic differential equations (SDEs) are a natural framework to describe both deterministic evolution and random fluctuations. However, conventional SDE models often rely on predefined functional forms and assumed parameters, which limits their realism. With growing availability of empirical observations—both network topology and node activity time series—there is an opportunity to infer governing SDEs directly from data. Prior AI-driven equation discovery has advanced identification of ODEs and PDEs, mainly for deterministic or few-body systems, and networked ODEs. Methods tailored to stochastic systems have largely emphasized trajectory prediction rather than extracting explicit SDEs. The study poses the central question: given observed network topology and node activity series, how can one infer coupled SDEs that capture the hidden stochastic dynamics of a complex system? The work aims to learn interpretable, explicit networked SDEs from data and validate them on real-world systems.
Literature Review
Data-driven discovery of dynamics has progressed through sparse regression (e.g., SINDy), physics-informed neural networks, and symbolic model discovery, primarily for deterministic ODE/PDE systems or small-scale settings. For networks, methods have inferred ODE-based dynamics or required extensive trials. Efforts to learn stochastic dynamics (e.g., SDE-Net, SVISE, SFI) often estimate drift and diffusion fields for prediction without yielding explicit interpretable SDEs that separate self, interaction, and diffusion components. Many prior validations used simulated systems with known ground truth, with few demonstrations on real stochastic systems. Recent macroscopic approaches learn coarse-grained dynamics for stochastic dissipative systems but do not address node-level microscopic equations on networks. This work advances the literature by extracting explicit, interpretable networked SDEs from observational data, including real-world applications.
Methodology
The authors introduce LaGNA (Langevin Graph Network Approach), a two-stage framework. Stage 1 is an implicit dynamical learner with a graph-guided message-passing architecture that decomposes dynamics into three neural modules: (i) self-dynamics f(·); (ii) interaction dynamics g(·,·) acting along network edges defined by adjacency A; and (iii) state-dependent diffusion Φ(·). For node i with d-dimensional state x_i(t), the model updates states via x_i(t+dt) = x_i(t) + (f_i(x(t)) + g_i(x(t), A_{ij}x(t)))dt + Φ(x(t)) dW_i, with W_i a d-dimensional Wiener process. The network is trained by maximizing the log-likelihood of the next-step observations under a Normal distribution parameterized by the predicted mean and variance (or covariance for d>1), avoiding overfitting to single stochastic realizations. The loss corresponds to the negative log-likelihood for univariate or multivariate normals, with μ and Σ determined by f, g, and Φ. The message passing uses A_{ij} to map information flow from sender j to receiver i, aggregating over neighbors to estimate interaction contributions. Stage 2 penetrates the trained modules to extract explicit equations. Using pre-constructed libraries of elementary functions for self (L_F), interaction (L_G), and diffusion (L_Φ), the approach regresses the module outputs against time-varying design matrices Θ_F, Θ_G, Θ_Φ. A two-phase procedure first performs global sparse regression (LASSO with cross-validation) to rank relevant terms, then incrementally builds a minimal-term model per component by monitoring a regression score κ^2 until no further improvement, yielding concise expressions for f, g, and Φ that form the final SDE. The framework handles signed and weighted networks by incorporating link types/weights into interaction modeling, e.g., separate NN modules for excitatory vs inhibitory links, and weighted adjacency for heterogeneous interactions. For flocking, the architecture is extended to a second-order model with specialized NNs for self-propulsion, cohesion, and alignment, and a composite loss combining negative log-likelihood with errors on displacement, velocity, and acceleration, enabling learning of second-order SDEs akin to the Vicsek model. For tau pathology, the model accounts for bidirectional diffusion along neuroanatomical connections (retrograde and anterograde) and spatial proximity (Euclidean distance), leading to an explicit heterogeneous diffusion equation combining these pathways with time-varying propagation rate and stochasticity. Quantification of equation inference accuracy uses SMAPE over inferred versus true coefficient sets in benchmarks.
Key Findings
- Simulated signed neuronal networks (Hindmarsh-Rose dynamics): LaGNA accurately separated and inferred self, diffusion, and distinct excitatory/inhibitory interaction terms from a single trial, reproducing the force field and stochastic trajectories.
- Simulated weighted networks (stochastic Rössler): The inferred SDE reproduced trajectories and force fields closely matching ground truth on a 20-node weighted network.
- Benchmark comparison on stochastic Lorenz network dynamics: Against Modified-SINDy, Two-Phase inference, SDE-Net, SVISE, and SFI, LaGNA achieved substantially lower inference error—outperforming by approximately two orders of magnitude in SMAPE—while also yielding explicit expressions separating self, interaction, and diffusion.
- Empirical bird flocks (homing pigeons, GPS at 0.2 s): Extending LaGNA to a second-order formulation, the method inferred self-propulsion, alignment, and cohesion functions from one flock dataset. The inferred SDE closely resembled the second-order Vicsek model and reproduced force fields and long-term collective behaviors. The learned equation generalized to three additional flocks by tuning only scaling coefficients, preserving equation structure. This provides strong empirical evidence that the Vicsek model captures genuine flocking dynamics.
- Tau pathology diffusion in mouse brain (PHF tau injections; 1, 3, 6, 9 months post-injection): The inferred explicit equation combined retrograde diffusion, anterograde diffusion, and spatial (Euclidean) proximity-based diffusion with a time-varying rate, and stochastic noise. It accurately predicted area occupied by pathology at later times (6 and 9 months), showed injection-site specificity (predictions were most accurate when using the true experimental seed regions compared to 500 random 5-region seeds), and demonstrated that a degenerate model ignoring regional heterogeneity performed worse. In LRRK2 G2019S mutant mice, the same equation form fit the data but revealed a strong retrograde preference, with average coefficient magnitudes b1 ≈ 0.7–0.9 (retrograde) vs b2 ≈ 0.1–0.3 (anterograde), aligning with recent experimental observations.
Discussion
The study addresses the core challenge of inferring explicit, interpretable SDEs governing complex networked systems directly from observational data. By separating self, interaction, and diffusion contributions via a topology-guided message passing architecture and subsequently extracting concise symbolic forms, LaGNA bridges predictive performance with interpretability. The findings show that this approach can accurately recover known stochastic dynamics in simulated signed/weighted networks and outperform state-of-the-art methods in equation inference accuracy. Crucially, it uncovers real-world governing equations: for flocking, it independently recovers an SDE closely mirroring the second-order Vicsek model, substantiating its applicability to real bird flocks; for tau pathology, it identifies distinct contributions of retrograde, anterograde, and spatial diffusion and reveals mutation-specific directionality. These results demonstrate that interpretable SDE discovery from data can yield mechanistic insights and facilitate downstream tasks such as forecasting and potentially control in complex systems.
Conclusion
The paper introduces LaGNA, a two-stage framework for learning interpretable stochastic dynamics on networks. It (i) separates dynamical sources (self, interaction, diffusion) using a graph-guided neural architecture trained via likelihood maximization, and (ii) extracts concise symbolic expressions using a library-based two-phase inference procedure. LaGNA accurately reconstructs governing SDEs in simulated signed and weighted systems and surpasses five baseline methods. Applied to empirical data, it provides compelling evidence that the second-order Vicsek model captures real flocking dynamics, and it derives an explicit heterogeneous diffusion equation for tau pathology that predicts progression and differentiates mutant from nontransgenic dynamics. Future work can extend LaGNA to jointly infer network topology and dynamics from limited data, better disentangle intrinsic versus extrinsic noise, reduce reliance on pre-constructed libraries, and incorporate higher-order interactions while managing model complexity.
Limitations
- Observability: Incomplete node activity time series can hinder inference; determining minimal sub-network observability requirements is needed.
- Noise: Real data include intrinsic stochasticity and extrinsic measurement noise. Without prior knowledge, all noise was treated as intrinsic. LaGNA performs well when extrinsic noise is below ~10% relative strength; stronger extrinsic noise benefits from denoising (e.g., Kalman–Takens). Better handling of extrinsic noise remains a challenge.
- Topology availability: When network topology is unavailable, jointly inferring topology and dynamics is necessary. Existing approaches often require many trials or focus on prediction rather than interpretable inference; achieving both from limited data is challenging.
- Library dependence: Pre-constructed libraries may miss relevant terms. While symbolic regression could help, it faces scalability issues in higher dimensions; automation improvements are needed.
- Higher-order interactions: Extending LaGNA to handle higher-order (e.g., triplet) interactions is conceptually straightforward but increases search complexity for optimal equations.
Related Publications
Explore these studies to deepen your understanding of the subject.

