Physics

Synthetic Lagrangian turbulence by generative diffusion models

T. Li, L. Biferale, et al.

This groundbreaking research by T. Li, L. Biferale, F. Bonaccorso, M. A. Scarpolini, and M. Buzzicotti introduces a machine learning approach leveraging a state-of-the-art diffusion model to generate single-particle trajectories in three-dimensional turbulence. The model excels in reproducing statistical benchmarks, paving the way for high-quality synthetic datasets with significant applications.... show more

Introduction

The study addresses how to generate realistic single-particle Lagrangian trajectories in high-Reynolds-number turbulence that faithfully reproduce multiscale, non-Gaussian statistics across the full range of time scales from large (τ_L) to dissipative (τ_η). Despite decades of modeling—ranging from Ornstein–Uhlenbeck processes, multifractal and multiplicative cascade models to non-/Markovian approaches—no method reproduces both statistical and topological features over all scales. The authors posit that modern generative machine learning, specifically diffusion models, can synthesize Lagrangian trajectories that capture fat-tailed velocity increments, anomalous scaling, intermittency, and multicomponent correlations, thereby providing a scalable alternative to costly DNS/experiments and enabling high-quality synthetic datasets for downstream applications.

Literature Review

Prior Lagrangian models have included two-time Ornstein–Uhlenbeck stochastic processes to model dynamics at τ_L and τ_η, infinitely differentiable processes, and a variety of Markovian/non-Markovian multifractal or multiplicative constructions. While these reproduce selected turbulent features, they fail to deliver fully realistic, multiscale 3D Lagrangian trajectories across all regimes. In parallel, ML generative tools (VAEs, GANs, diffusion models) have achieved strong results in vision, audio, language, and healthcare. Applications in fluid mechanics include generation, super-resolution, prediction, and inpainting of Eulerian fields, but many validations have been limited to 2D, near-Gaussian regimes or low-order statistics. A recent 1D Eulerian generator captured up to fourth-order structure functions but struggled at higher orders. Thus, both equation-informed and data-driven tools have lacked comprehensive accuracy for high-Reynolds, 3D Lagrangian statistics and geometry, motivating diffusion models as a new approach.

Methodology

Data: High-resolution DNS of 3D incompressible Navier–Stokes equations in a periodic cube with homogeneous, isotropic forcing (Ornstein–Uhlenbeck process). Spatial pseudospectral solver with 2/3 dealiasing. Reynolds number R_λ ≈ 310. Lagrangian tracers are integrated using B-spline sixth-order interpolation and a second-order Adams–Bashforth scheme. Dataset comprises N_t = 327,680 trajectories, each of length T ≈ 1.3 τ_L ≈ 200 τ_η sampled at dt_L ≈ 0.1 τ_η, giving K = 2,000 points per trajectory. Particles are injected after stationarity to ensure stationary Lagrangian statistics.

Benchmarks: Velocity increments δ_τ V_i(t) = V_i(t+τ) − V_i(t). PDFs of δ_τ V and acceleration a_i (using 0.1 τ_η resolution) assess non-Gaussianity and extreme events. Structure functions S^(p)(τ) = ⟨|δ_τ V|^p⟩ with anomalous scaling exponents ζ(p) and local scaling exponents ζ(p,τ) = d log S^(p)/d log τ are evaluated (even orders up to p=8). Generalized flatness F_p(τ) = S_p(τ) / [S_2(τ)]^{p/2}. Mixed-component flatness and acceleration correlation C_ii(τ) are also computed to test multicomponent coupling.

Models: Two denoising diffusion probabilistic models (DMs) are trained: DM-1c (single velocity component synthesis) and DM-3c (simultaneous synthesis of all three correlated components). Forward diffusion progressively corrupts trajectories with Gaussian noise; the reverse process is learned via a UNet to denoise stepwise. Training minimizes a reweighted MSE loss on noise (L_simple) per step. The UNet follows a state-of-the-art architecture with multihead attention. Training details: DM-1c trained for ~250k iterations; DM-3c for ~400k iterations; batches sample random diffusion step n per sample. An optimized tanh noise schedule (tanh6-1) enables N = 800 diffusion steps (vs. 4,000 for linear) improving efficiency; a power-law alternative was slightly inferior.

Evaluation: Compare DM-generated statistics with DNS across scales: PDFs of δ_τ V (various τ/τ_η), acceleration PDFs, S_p(τ) for p=2,4,6, generalized flatness F_p(τ) for p=4,6,8, mixed-component flatness, local exponents ζ(4,τ) (and similarly for higher orders), and acceleration correlation. A reduced-data model (DM-1c-10%) tests data efficiency and generalization to extreme events. A Wasserstein GAN baseline was also evaluated for comparison. Computational costs and data/code availability are reported.

Key Findings

Diffusion models accurately reproduce non-Gaussian PDFs of velocity increments across scales, with excellent agreement for τ > τ_η and increasingly intermittent fat tails toward small τ.
Acceleration PDFs from DM-1c match DNS fat tails up to ~60–70 standard deviations, showing strong capture of extreme events.
Structure functions S_p(τ) for p = 2, 4, 6 and generalized flatness F_p(τ) for p = 4, 6, 8 align closely with DNS over multiple decades in τ; odd orders vanish due to symmetry, as expected.
Local scaling exponents ζ(p,τ) (shown for p = 4; similar for p = 6, 8) match DNS and state-of-the-art experiments/DNS, including the characteristic dip near τ ≈ τ_η, within ~5% error.
DM-3c reproduces multicomponent correlations: mixed-component flatness agrees well with DNS, and 3D trajectories exhibit coherent vortical structures similar to DNS.
Slight deviations appear below the dissipative scale (τ → 0): DM-3c signals are marginally smoother; acceleration and flatness statistics show small discrepancies.
Single-time second-order statistics (Table 2) are well matched (e.g., E: DNS 3.0 vs DM-3c 2.9; λ: DNS 1.7×10⁻³ vs DM-3c 1.6×10⁻³), though cross-acceleration correlations are underestimated by DM-3c compared to DNS.
Strong generalizability: DM-generated datasets exhibit extended tails and rarer, more intense events than present in the training set while preserving realistic statistics. A model trained on only 10% of DNS (DM-1c-10%) achieves similar accuracy, indicating data efficiency and robust generalization.
A Wasserstein GAN baseline was satisfactory at large and intermediate scales but failed at small scales, underperforming DMs for multiscale intermittent statistics.

Discussion

The diffusion-model approach breaks a long-standing barrier in modeling multiscale Lagrangian turbulence by synthesizing trajectories that reproduce fat-tailed statistics, anomalous scaling laws, intermittency, and multicomponent coupling from large to dissipative scales. Accurate reproduction of structure functions and local scaling exponents (including the dip near τ_η) demonstrates that the learned generative process captures subtle, scale-dependent non-Gaussian properties central to Lagrangian turbulence. The model’s ability to generate rarer, more intense events than seen during training while maintaining correct statistics underscores strong generalization and makes it practical to produce large, high-quality synthetic datasets for downstream tasks. Remaining discrepancies near the smallest scales (slightly smoother signals, reduced cross-acceleration correlations) indicate opportunities for refinement in modeling dissipative-range dynamics and component cross-couplings.

Conclusion

This work introduces DM-1c and DM-3c diffusion models that generate synthetic single-particle Lagrangian trajectories matching DNS across multiscale benchmarks: velocity increment PDFs, acceleration PDFs, structure functions up to eighth order, generalized and mixed-component flatness, and local scaling exponents. The models generalize to rarer extremes beyond training data and provide scalable, high-quality datasets for pretraining and analysis in turbulent dispersion problems. Future directions include conditional diffusion models to adapt across different flows, boundary conditions, and higher Reynolds numbers; wavelet-factorized generative schemes to improve interpretability and control over multiscale structure; and accelerated sampling techniques. Applications include relative dispersion (Richardson diffusion), multiparticle shape dynamics, augmentation for ocean-surface drifters, inertial particle trajectory generation/classification, and data inpainting.

Limitations

Slight underperformance below the dissipative scale (τ → 0), with smoother signals and small deviations in acceleration and flatness statistics.
Underestimation of cross-acceleration correlations relative to DNS in DM-3c.
Current models are trained for a single flow configuration (homogeneous, isotropic forcing at R_λ ≈ 310) and are not yet generalized to other boundary conditions, forcing mechanisms, or higher Reynolds numbers.
While efficient relative to DNS requirements for Lagrangian data, diffusion-model training and sampling still entail substantial computational costs; sampling speed-ups remain for future work.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity

G. Agrawal, A. Kaur, et al.

Linguistics and Languages

Expansion by migration and diffusion by contact is a source to the global diversity of linguistic nominal categorization systems

M. Allassonnière-tang, O. Lundgren, et al.

Computer Science

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

N. Dainese, M. Alakuijala, et al.

Computer Science

AI-AI bias: Large language models favor communications generated by large language models

W. Laurito, B. Davis, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny