Engineering and Technology
Forecasting the outcome of spintronic experiments with Neural Ordinary Differential Equations
X. Chen, F. A. Araujo, et al.
The study addresses the challenge of predicting complex, time-dependent behaviors of spintronic devices, which exhibit rich dynamics due to intricate magnetic textures and multiple excitations (fields, currents, voltages, temperature, pressure). Conventional micromagnetic simulations solve large coupled Landau–Lifshitz–Gilbert equations over many cells, incurring prohibitive runtimes and difficulty fitting experimental data due to geometry/material uncertainties and noise. The research question is whether Neural Ordinary Differential Equations (Neural ODEs) can be reformulated to learn the governing dynamics of nanomagnetic systems from limited measurements (typically a single observable) under external, time-varying inputs, and then forecast long-term behavior and experimental outcomes efficiently and accurately. The work introduces modifications to Neural ODEs to handle partial observability and exogenous inputs, aiming to complement micromagnetic simulations and enable fast prediction and optimization of spintronic experiments and applications such as neuromorphic computing.
Prior approaches rely on micromagnetic simulations, which are accurate but slow and sensitive to model inaccuracies; analytical models often neglect deformations or material imperfections, limiting applicability to multi-skyrmion and real devices. Machine learning has been used in physics for material discovery and dynamics learning and in micromagnetism to extract microstructural features or predict short magnetization dynamics. Neural ODEs, advantageous for continuous-time modeling and efficient adjoint-based training, had two key limitations for physical devices: requirement of full state observability and inability to handle time-varying inputs. Related works address partial observability via physics-informed architectures (Hamiltonian/Lagrangian neural networks) or augmented higher-order Neural ODEs, but often require derivatives or extra processing not robust to noise. Neural ODEs with inputs have been explored (augmented and parameterized variants), but a practical framework tailored to noisy, partially observed experimental nanodevices with exogenous drives was lacking. This work combines time-delay embedding with Neural ODEs to overcome these gaps.
- System targets: Spintronic devices including skyrmion-based nanodisks (single and multi-skyrmion with grain inhomogeneity) and an experimental spin-torque nano-oscillator used in reservoir computing tasks.
- Neural ODE reformulation for partial observability: Construct the state vector y(t) from the single measured observable y1(t) and its k−1 time-delayed copies: y(t) = [y1(t), y1(t+Δta), ..., y1(t+(k−1)Δta)]. This delay-embedding captures hidden state information without noise-amplifying numerical derivatives, justified by Takens/Sauer embedding theorems.
- Incorporation of time-varying inputs: Extend inputs e(t) similarly with delayed versions [e(t), e(t+Δta), ..., e(t+(k−1)Δta)], and include time t as a state with derivative 1. The Neural ODE takes the form ẏ = fθ(y, e(t), t), where fθ is a feedforward neural network.
- Training algorithm (Algorithm 1): Mini-batch training over short trajectory segments with ADAM optimizer to minimize MSE between predicted and observed trajectories. ODE integration via fixed-step 4th order Runge–Kutta (3/8 rule). Gradients via adjoint sensitivity.
- Network architectures: Three-layer tanh MLP for fθ. Hidden units: 50 for MuMax simulation data; 100 for experimental oscillator data. Output layer linear. Weights initialized N(0, 0.1²).
- Hyperparameters and data handling: Delay step Δta set to sampling interval Δt. Normalize time by factor s = 0.0125/p (p is base sampling period), and scale inputs/outputs (e.g., Δmz and voltages multiplied by 10 during training). Mini-batch time length bt=20, mini-batch size bs=50, learning rate 0.001.
- Micromagnetic simulations (MuMax3): Mesh 1×1×1 nm³; A=15 pJ/m, Ms=580 kA/m, α=0.01, interfacial D=3.5 mJ/m², PMA Ku=0.8 MJ/m³; VCMA coefficient ≈100 fJ V⁻¹ m⁻¹ (1 nm oxide); 0.1 V induces ΔKu ≈10 kJ/m³. • Single skyrmion (Fig. 2–3): 80 nm disk; input: random sine voltage 4 GHz, amplitude ±2 V (ΔKu ∈ [−0.2, 0.2] MJ/m³). • Multi-skyrmion with grains: 120 nm disk; grain size 10 nm; random 20% Ku and D variations, 5% random cubic anisotropy direction; input: random sine 4 GHz, ±2 V. • Parameter sweep (Fig. 1): 100 nm disk; training inputs ΔKu random sine 4 GHz, ±0.05 MJ/m³ about 0.8 MJ/m³; ΔD random sine 0.4 GHz, ±0.4 mJ/m² about 3.0 mJ/m²; output Δmz sampled every p=2.5 ps for 50 ns (simulation time ~37–43 min). Test: pulse ΔKu=0.04 MJ/m³ (or ΔD=0.1 mJ/m²) for 1 ns, FFT of Δmz for frequency extraction.
- Neural ODE training datasets: Training/validation points: single skyrmion 10k/5k; multi-skyrmion 10k/5k; parameter-based 15k/5k; experimental oscillator 50k/10k. Sampling Δt chosen as 2p for simulations and p for experiments (p=2.5 ps for MuMax; 100 ns for oscillator measurements).
- Reservoir computing tasks: • Mackey–Glass prediction: Generate MG series (β=0.2, γ=0.1, τ=17), solve dt=0.1 for 100k steps; downsample by 10 to 10k points; use masking to create Nr=50 virtual nodes; each masked value held ≈10 ps in the reservoir; readout via ridge regression (μ=1e−4). Train on first 5000+H points, test on remaining 5000−H. Compare Neural ODE vs MuMax predictions, compute NRMSE vs horizon H. • Spoken digit recognition (TI-46): Five female speakers, digits 0–9, 10 utterances each. Preprocessing via cochlear or spectrogram filtering to Ny channels, mask to No=400 virtual neurons; each preprocessed value applied as constant current for tc=100 ns; readout via Moore–Penrose pseudo-inverse; cross-validated over choices of N training utterances (average over 10!/(N!(10−N)!) combinations). Train Neural ODE on 5 ms (50,000 samples) from first speaker’s first utterance; predict all others.
- Noise modeling for experiments: From 5 ms training, compute error (experiment − noiseless Neural ODE) distribution; fit Gaussian (σerr ≈1.83 mV). Inject Gaussian noise at input so that predicted output noise σout ≈1.76 mV matches σerr, improving recognition-rate agreement. Micromagnetic simulation of the oscillator task is infeasible (extrapolated ~716 years on GTX 1080), while Neural ODE simulation ~2 hours.
- Reformulated Neural ODE framework handles partial observability using time-delay embedding and incorporates time-varying inputs by augmenting with delayed inputs and explicit time state.
- Skyrmion device modeling: • Training on 50 ns with random ΔKu (±0.05 MJ/m³) and ΔD (±0.4 mJ/m²) around Ku=0.8 MJ/m³, D=3 mJ/m² yields excellent agreement with MuMax for Δmz(t) and intrinsic breathing frequency across Ku and D sweeps. • Model dimension: k=2 (with one delay) yields near-zero training MSE; k=1 fails to converge to low error. Higher k beneficial in noisy settings. • Multi-skyrmion with grain inhomogeneity accurately predicted; coherent oscillations preserved; k≥2 sufficient.
- Reservoir computing (Mackey–Glass): Neural ODE predictions match micromagnetic simulations across horizons H, with NRMSE vs H curves nearly identical for single and multi-skyrmion reservoirs. Computational speedup: ~200× (single skyrmion) and ~360× (multi-skyrmion). Example runtimes: Neural ODE ~20 minutes vs MuMax ~3–5 days.
- Experimental nano-oscillator: • Trained on only 5 ms measured data; Neural ODE (k=2) reproduces voltage output trajectories closely; training loss approaches very small values but not zero due to experimental noise. • Noise calibration: σout ≈1.76 mV matches σerr ≈1.83 mV from error distribution; adding matched Gaussian noise to inputs enables recognition rates that closely match experiments; without noise, recognition rates deviate from experiments. • End-to-end prediction of spoken digit recognition across speakers feasible: Neural ODE simulation ~2 hours vs week-long experiments; micromagnetics would require ~716 years (extrapolated).
- Generalization: Trained models predict responses to inputs with different waveforms than used in training (e.g., pulses vs random sines) and to different parameter regimes (Ku, D).
By augmenting Neural ODEs with delay-embedded observables and delayed exogenous inputs, the study overcomes the practical obstacles of limited measurements and driven dynamics in physical devices. The trained models accurately capture deterministic nanomagnetic dynamics, enabling fast and faithful prediction of both simulated skyrmion systems and real nano-oscillator experiments. This addresses the need for efficient, experiment-faithful models in spintronics, where micromagnetic simulations are often too slow and miss experimental noise/imperfections. The approach preserves physical interpretability in terms of response frequencies and dynamical modes and generalizes to unseen input patterns and parameter values. In neuromorphic tasks (reservoir computing), the Neural ODE models reproduce the performance of full micromagnetic simulations with orders-of-magnitude speedup, and when augmented with calibrated noise, match experimental recognition rates, highlighting the method’s utility for rapid design-space exploration, optimization, and forecasting. The framework is applicable to other electronic dynamical systems provided representative training data are available.
The paper presents a Neural ODE-based methodology tailored to physical nanodevices that requires only a single measured output and accommodates time-varying inputs via delay embedding. It achieves high-fidelity, fast predictions for skyrmion-based devices and spintronic nano-oscillators, reproducing micromagnetic and experimental behaviors while reducing simulation time by hundreds of times. The method enables forecasting outcomes of complex experiments (e.g., reservoir computing tasks) and can support rapid evaluation and optimization in spintronics and beyond. Future directions include extending to stochastic Neural ODEs to model fundamentally stochastic physical behaviors, expanding training datasets to encompass multiple behavioral regimes, and integrating the approach into machine learning-assisted simulation platforms.
- The approach assumes underlying deterministic dynamics; it cannot fully model profoundly stochastic behaviors (e.g., room-temperature stochastic switching in MTJs or certain domain-wall motions) without stochastic extensions.
- Accurate predictions require training data representative of the various dynamical regimes; generalization is limited if the training set lacks certain behaviors.
- Delay-embedding dimension k must be chosen carefully, particularly in noisy data; insufficient k (e.g., k=1) degrades performance.
- While robust to noise compared to derivative-based methods, the model still requires noise handling (e.g., calibrated Gaussian input noise) to match experimental outcomes.
- Availability and continuity of time-series training data constrain model training; the method does not infer exact governing equations but a learned surrogate.
Related Publications
Explore these studies to deepen your understanding of the subject.

