Physics
Discovering sparse interpretable dynamics from partial observations
P. Y. Lu, J. A. Bernad, et al.
The paper addresses the problem of discovering governing equations and reconstructing hidden states when only a subset of a nonlinear dynamical system is observed. Traditional system identification performs well for linear systems, but nonlinear identification and state reconstruction remain difficult, particularly with limited or partial data. Black-box deep learning approaches can predict dynamics but often require large datasets, generalize poorly outside training distributions, and lack interpretability, though physics-informed inductive biases (e.g., symmetries) can help. The authors target interpretable system identification under partial observability by leveraging the structure inherent in such problems: reconstruct hidden variables and identify sparse governing equations simultaneously. They propose an encoder to reconstruct hidden states and a sparse symbolic model for the dynamics, trained by matching higher-order time derivatives computed symbolically to finite-difference estimates from observed data.
The paper situates itself among several strands of prior work. Deep learning has achieved strong predictive performance for nonlinear dynamics, including with partial observations, but often lacks interpretability and requires large datasets unless enriched with physical priors and symmetries. Koopman operator-based system identification offers a linear perspective and a framework to blend with neural networks, but struggles with systems exhibiting features like chaos that induce a continuous spectrum not captured by finite-dimensional linear systems, though refinements exist. Sparse symbolic identification methods (e.g., SINDy and variants) directly seek parsimonious governing equations, yielding interpretable models that generalize well. Prior works have shown sparsity priors can produce parsimonious nonlinear models and, combined with autoencoders, extract interpretable latent variables. This work builds on the sparse symbolic paradigm, extending it to partially observed settings by coupling a hidden-state encoder with a symbolic dynamics model and training via derivative matching.
The framework comprises two components: (1) an encoder eθ that reconstructs hidden states from sequences of visible states; and (2) a sparse symbolic model F(x) representing the governing equations. Given visible states x(t) = g(χ(t)) and unknown hidden states, the encoder processes a local window of observed states to reconstruct hidden variables φ(t), then aggregates with the visible part via a known aggregation a(·) to form the full reconstructed state χ = a(x, φ). The dynamics are modeled symbolically as dx/dt = F(x) = Σi θi fi(x), where fi are predefined library terms (e.g., constants, monomials, and spatial derivatives for PDEs) and θi are learnable coefficients. The dimensionality of the latent/hidden state can be treated as a hyperparameter or guided by intrinsic dimensionality estimation. Training uses only partial observations by matching higher-order time derivatives of the visible states: symbolic derivatives of g(x) obtained via automatic differentiation of F are compared to finite-difference estimates from data. The loss is a weighted mean squared error over derivative orders p = 1..P between symbolic and finite-difference derivatives, with weights αp (and variance normalization as described). Sparsity in F is enforced by iterative thresholding: coefficients with magnitude below θ_thres are set to zero at regular intervals. L1 regularization was tested but degraded performance when strong enough to induce sparsity. Automatic computation of higher-order symbolic derivatives avoids explicit integration via an algorithmic trick: define a wrapper Z(x, ε) that is the identity at ε=0 but has custom derivative rule ∂Z/∂ε = F(Z). Then dn x/dt^n = ∂^n Z/∂ε^n|_{ε=0} can be computed by standard automatic differentiation (implemented with JAX), yielding exact symbolic time derivatives efficiently. Implementation details: ODE experiments use time series sampled for 10,000 steps (Δt = 1e-2), normalized to unit variance. The encoder is a 1D temporal CNN taking nine-frame windows {x(t−4Δt),…,x(t+4Δt)} to reconstruct φ(t), with three conv layers (kernel sizes 9-1-1; channels 128-128-1), enforcing temporal locality. The symbolic library includes constant, linear, and quadratic monomials (1, u, v, w, u^2, v^2, w^2, uv, uw, vw). Effective symbolic time step scaled by 10 for conditioning. Training: 50,000 steps, AdaBelief optimizer, lr=1e-3, α1=α2=1 (p>2 set to 0); sparsify every 5,000 steps with θ_thres=1e-3. PDE experiments use 64×64 grids (Δx=Δy=1) for 1,000 time steps (Δt = 5e-2), normalized to unit variance. The encoder is a 3D spatiotemporal CNN with three conv layers (kernel sizes 5-1-1; channels 64-64-1), enforcing locality in space and time. The symbolic library includes constant, linear, quadratic terms, and up to second-order spatial derivatives (∂x u, ∂y u, ∂x^2 u, ∂y^2 u, etc., similarly for v). Time step and spatial grid spacing in the symbolic model are scaled by 10 for conditioning. Diffusion system training: 50,000 steps, lr=1e-4, α1=1, α2=10; sparsify every 1,000 steps with θ_thres=5e-3. Diffusive Lotka-Volterra training: 100,000 steps, lr=1e-3, α1=α2=1; sparsify every 1,000 steps with θ_thres=2e-3. Phase reconstruction (1D nonlinear Schrödinger): data on 64-point spatial mesh (Δx=2π/64) for 500 time steps (Δt=1e-3). Prior knowledge constrains the symbolic library to spatial derivatives ∂x^p ψ for p∈{1,2,3,4} and odd nonlinearities via |ψ|^q ψ with |ψ|^q (q∈{2,4,6,8}); global phase shift symmetry is assumed. Instead of a neural encoder, a direct embedding is used: learn a phase parameter at each (x,t), offering flexibility but poor generalization and scaling with data size. An encoder regularization term R_enc = β Σi (∂t x̂_i − Δ x̂_i)^2 with β=10^3 is added to align symbolic and finite-difference derivatives of the reconstructed state. Training: 100,000 steps, lr=1e-4, α1=α2=1; sparsify every 10,000 steps with θ_thres=1e-3. Code and data-generation scripts are available at https://github.com/peterparity/symder. Reported training times: ~2.5 min for ODEs on one RTX 2080 Ti; ~2 h for PDEs on four GPUs.
- ODE systems (Rössler and Lorenz; visible u,v with hidden w): The method correctly identifies the governing equations and reconstructs the hidden state. Hidden-state reconstruction error: 4.6×10^-4 (Rössler; relative to hidden-state range) and 1.7×10^-3 (Lorenz). Discovered hidden states can differ by an affine transform; aligning via linear regression yields close agreement and accurate symbolic equations.
- PDE systems: • 2D diffusion with exponentially decaying source v: Identified equations match ground truth closely (e.g., ∂t u ≈ 0.200 ∂x^2 u + 0.200 ∂y^2 u + 0.999 v; ∂t v ≈ −0.100 v). Hidden-source reconstruction relative error: 1.4×10^-4. • 2D diffusive Lotka-Volterra predator-prey: Accurately identified diffusion and reaction terms (e.g., ∂t v ≈ 0.010 ∂x^2 v + 0.010 ∂y^2 v − 0.988 v + 0.991 u v). Hidden-species reconstruction error: 1.0×10^-3. Encoder reconstructions are slightly blurry due to higher nonlinearity and complexity.
- Nonlinear Schrödinger (phase reconstruction from amplitude only): Correctly identified the governing equation form with parameters close to true (reconstructed ∂t ψ ≈ −0.52 ∇^2 ψ − 1.07 |ψ|^2 ψ vs true −0.5 and −1). Phase reconstruction has relative error ≈0.35, but spatial phase derivative error is significantly lower (≈0.057). Identified equations enable potential post-processing with specialized phase-retrieval algorithms to improve phase accuracy.
- Computational aspects: The derivative-matching approach avoids explicit integration, improving efficiency. Accurate results were obtained matching first- and second-order derivatives; higher orders may be needed with more hidden variables.
The proposed framework successfully discovers sparse, interpretable governing equations and reconstructs hidden states from partial observations across diverse ODE and PDE systems. By fitting symbolic models, it captures the exact functional forms underlying the dynamics, providing interpretability and facilitating physical insight. The derivative-matching strategy, enabled by an automatic-differentiation trick, avoids integration overhead and issues like stiffness that can complicate integrator-based training, yielding computational efficiency. However, reliance on higher-order finite-difference derivatives makes the approach more susceptible to noise; smoothing and careful sparsity tuning mitigate this, and emerging methods that jointly infer noise distributions could further improve robustness. The framework is flexible: known constraints (e.g., symmetries, restricted libraries) can be injected to improve data efficiency and identifiability, as demonstrated in the phase-reconstruction example. Future enhancements include more robust encoders (e.g., variational or data-assimilation inspired) and richer symbolic architectures (e.g., compositional symbolic regression units) to broaden the class of discoverable dynamics.
The study introduces an end-to-end machine learning framework that, from partial observations, reconstructs hidden states and discovers sparse, interpretable governing equations by combining a learned encoder with a sparse symbolic dynamics model trained via higher-order derivative matching. Experiments on chaotic ODEs, PDEs with diffusion and reaction dynamics, and a nonlinear Schrödinger phase-reconstruction task demonstrate accurate equation discovery and hidden-state recovery from partial data. The method is computationally efficient and adaptable to domain priors, yielding models that generalize and provide physical interpretability. Future work will focus on improving robustness to noise (e.g., integrating noise-identification techniques), developing stronger and more generalizable encoders (including variational approaches), and exploring more flexible symbolic-model architectures to capture a wider range of physical laws without large predefined libraries.
- Susceptibility to noise due to reliance on higher-order finite-difference derivative estimates; requires smoothing and careful sparsity tuning.
- For datasets with a larger fraction of hidden states, higher-order derivative matching may be necessary, potentially increasing noise sensitivity and training complexity.
- The phase-reconstruction encoder (direct embedding) scales with dataset size, lacks a reusable mapping for new data, and was harder to train; reconstructed phases show drift and higher error compared to phase-gradient.
- Slightly blurry reconstructions for more complex PDEs (e.g., diffusive Lotka-Volterra) indicate encoder limitations in highly nonlinear settings.
- Requires selection/design of a predefined term library; overly large libraries hinder sparsity discovery, while too small libraries may miss true dynamics—careful curation is needed.
- Although integration is avoided, hyperparameter choices (e.g., thresholds, derivative orders, library terms) and sufficient trajectory diversity/length are important for identifiability and may limit generalizability in scarce or highly noisy data regimes.
Related Publications
Explore these studies to deepen your understanding of the subject.

