Engineering and Technology
Accurate machine learning force fields via experimental and simulation data fusion
S. Röcken and J. Zavadlav
Explore groundbreaking research by Sebastien Röcken and Julija Zavadlav on leveraging Machine Learning to fuse Density Functional Theory and experimental data for enhanced accuracy in titanium force fields. This innovative approach promises to correct DFT inaccuracies while preserving essential material properties.
~3 min • Beginner • English
Introduction
The study addresses the challenge of developing machine learning (ML) interatomic potentials that are both accurate and efficient for molecular dynamics (MD) simulations. Traditional bottom-up training on ab initio data (typically DFT) provides forces and energies for specific configurations but suffers from limited accuracy compared to higher-level methods, high computational cost, and potential dataset biases or distribution shifts. Top-down training on experimental observables offers richer information per sample but requires MD simulations and complicates gradient computation through long trajectories. The core research question is whether fusing DFT simulation data with experimental measurements during training can correct DFT inaccuracies while preserving broad accuracy and generalization of the ML potential. The work focuses on titanium as a test case and aims to concurrently match experimental mechanical properties and lattice parameters across temperatures while maintaining consistency with DFT energy, force, and virial predictions.
Literature Review
The paper situates itself within efforts to overcome the accuracy/efficiency trade-off in MD via ML potentials trained on quantum data (often DFT) or experimental measurements. While CCSD(T) is the gold standard, it is too costly for large datasets, making DFT the common but imperfect reference, often yielding discrepancies with experiments (e.g., titanium’s temperature-dependent lattice parameters and elastic constants, and phase diagram deviations). Dataset curation and active learning have been used to enhance coverage, but robust uncertainty quantification for NN potentials remains challenging. Top-down approaches have become feasible due to differentiable simulation frameworks; however, direct backpropagation through long trajectories is impractical. The Differentiable Trajectory Reweighting (DiffTRe) method enables training on time-independent observables by reweighting without full backpropagation. Prior hybrid strategies added two-body corrections trained to experimental data on top of fixed ML potentials trained on DFT, but such corrections are limited in reproducing many observables simultaneously. The authors propose training a single deep ML potential that alternates between DFT (bottom-up) and experiment (top-down) trainers to jointly satisfy both data sources.
Methodology
- Model: A message passing graph neural network (GNN) potential based on DimeNet++ implemented in JaxMD. Original hyperparameters are used except that embedding sizes are reduced by a factor of four for speed; cutoff set to 0.5 nm.
- Data: DFT database (previously published) of 5704 samples including equilibrated, strained, and randomly perturbed hcp, bcc, and fcc titanium structures, plus configurations from high-temperature MD and active learning. Experimental targets are temperature-dependent solid-state elastic constants of hcp titanium measured at 22 temperatures from 4–973 K; four temperatures (23, 323, 623, 923 K) are selected for training to reduce cost while assuming temperature transferability. Zero pressure is additionally targeted by evaluating elastic constants in NVT with box sizes fixed to experimental lattice parameters, indirectly constraining lattice constants.
- Training schemes compared:
1) DFT pre-trained: train only on DFT labels (energies, forces, virials).
2) DFT, EXP sequential: initialize from DFT pre-trained, then train only on experimental observables using EXP trainer.
3) DFT & EXP fused: alternate epochs of DFT and EXP trainers starting from DFT pre-trained.
Early stopping selects the final model.
- DFT trainer: Weighted MSE loss over energies U, forces F, and virials V. Weights: ω_E = 1e−6, ω_F = 1e−6, ω_V = 4e−6 (virial only for uniformly deformed supercells). Batch optimization per epoch.
- EXP trainer: Loss averages the squared errors over observables and temperatures with weights ω_P = 1e−16 for pressure and ω_C = 1e−16 for elastic constants C_ij (Voigt). Gradients via DiffTRe reweighting in the canonical ensemble; for each parameter update, a new forward trajectory is generated so reference and perturbed potentials coincide at initialization.
- MD simulation protocols: JaxMD velocity Verlet, 0.5 fs timestep; systems generally 256 atoms; Ti mass 47.867 a.u. NVT (Langevin, friction 4 ps−1) during EXP training. Postprocessing uses Nose-Hoover (chains length 5, 2 chain steps, 3 Suzuki-Yoshida steps; thermostat τ = 50 fs; barostat τ = 500 fs) for NVT/NPT as needed. Pressure set to 0.
- EXP trainer trajectory lengths: 80 ps NVT, discard first 10 ps; save every 0.1 ps. Isothermal elasticity via stress fluctuations.
- Property evaluations: hcp elastic constants and pressure from 1 ns NVT after 100 ps equilibration; bulk/shear moduli from Voigt formulas; Poisson ratio from ν=(3K−2G)/(2G+6K). hcp lattice constants from 100 ps NPT equilibration + 100 ps production. Phonons via Phonopy with 5×5×3 hcp supercell and 0.01 Å displacements. Liquid structure: RDF/ADF at 1965–1973 K in 2048-atom box; ADF for triplets within 0.4 nm. Diffusion via VACF and Green-Kubo at 1953–2110 K with extensive equilibration and sampling. bcc elastic constants at 1273 K from 128-atom cell with 100 ps NPT + 100 ps NVT equilibration and 1 ns NVT production.
- Model selection: fixed total epochs with early stopping. Alternative batch-wise switching between trainers is noted as possible but not used.
Key Findings
- Accuracy on DFT test set (RMSE/MAE):
• DFT pre-trained: Energy 6.0/4.4 meV atom−1; Force 92.5/62.1 meV Å−1; Virial 406.6/261.2 meV atom−1.
• DFT, EXP sequential: Energy 385.1/384.9 meV atom−1; Force 123.6/83.8 meV Å−1; Virial 401.5/267.1 meV atom−1. After post-hoc mean energy shift correction: Energy 14.0/9.5 meV atom−1.
• DFT & EXP fused: Energy 7.9/6.2 meV atom−1; Force 111.2/76.6 meV Å−1; Virial 405.6/263.2 meV atom−1.
- Target experimental properties (hcp Ti mechanical and lattice properties, 4–973 K):
• DFT pre-trained deviates on average by 6% (bulk modulus), 24% (shear modulus), and 9% (Poisson’s ratio), with some elastic constants off by >20 GPa.
• Both models trained with EXP data (sequential and fused) match elastic constants within a few GPa; bulk, shear, and Poisson’s ratio relative errors are <3% across the temperature range, despite fitting at only four temperatures.
• Lattice constants: DFT & EXP fused achieves relative deviations <0.1% from experiment; EXP sequential is closest overall among EXP-trained models.
- Off-target solid state property:
• Phonon dispersion for hcp Ti agrees well with experiment for all models; DFT pre-trained and fused show expected strong agreement; surprisingly, the EXP sequential model also agrees well, indicating DFT pretraining constrains solutions and subsequent EXP training modifies parameters locally (consistent with similar force errors).
- Off-target liquid properties:
• RDF and ADF at ~1965–1973 K closely match experiments for all models; ADF peak positions match well though amplitudes differ slightly.
• Self-diffusion coefficients (1953–2110 K): EXP-trained models perform better on average than DFT pre-trained; EXP sequential performs best.
- Off-target pressure dependence (300 K):
• hcp lattice constants vs pressure: EXP sequential closest to experimental references; fused also in good agreement.
- bcc elastic constants at 1273 K:
• Considering discrepancies among experimental references, assuming Ledbetter et al. as most accurate, the DFT & EXP fused model performs best overall (C11=112.0 GPa, C12=85.3 GPa, C44=32.5 GPa), compared with DFT pre-trained (98.4, 79.4, 27.4) and EXP sequential (119.9, 87.4, 34.9).
Discussion
Fusing DFT and experimental data during training enables the ML potential to reconcile partially conflicting objectives due to DFT and measurement errors. Alternating DFT and EXP trainers yields a model that preserves near-DFT-level accuracy for energies, forces, and virials while correcting DFT biases in mechanical and lattice properties across a wide temperature range. The small increase in force error compared to DFT-only training is outweighed by substantial improvements in target experimental observables. The EXP-only (sequential) refinement highlights that experimental training on time-independent observables constrains derivatives but leaves absolute energies undetermined up to a constant shift, underscoring the importance of including energy-related DFT data when energy-dependent predictions are needed. Pretraining on DFT appears to constrain the parameter space, and subsequent EXP training modifies the model locally, which helps maintain good phonon dispersions and other off-target properties. Generalization is strong: phonons, liquid-state structure, diffusion, pressure dependence, and bcc high-temperature elasticity are all reasonable to excellent, with the fused model often providing the best balance across properties and states. This demonstrates the high capacity of modern ML potentials and the effectiveness of data fusion to obtain broadly accurate force fields.
Conclusion
The work introduces and validates a fused training strategy that alternates between DFT and experimental trainers to build a single deep ML potential for titanium that accurately reproduces both DFT-labeled data and key experimental observables (elastic constants and lattice parameters) across temperatures. The approach corrects DFT inaccuracies on target properties while preserving or modestly improving off-target performance (phonons, liquid structure, diffusion, pressure response, and bcc elasticity). The method eliminates the need for separate correction potentials and is generalizable to other materials. Future directions include exploring multistate reweighting for improved gradient estimates and accuracy, expanding experimental target sets to further constrain potentials, assessing optimal trainer-switching schedules (epoch vs batch), and investigating uncertainty quantification to guide active data fusion.
Limitations
- Experimental training on time-independent observables does not constrain absolute energies; energy-related predictions can be shifted if DFT data are not included (as seen in the EXP sequential model), requiring post-hoc corrections.
- DFT and experimental data contain errors and may be partially incompatible, imposing trade-offs; sequential EXP training can overfit experiments at the expense of DFT energy consistency.
- EXP training relies on forward MD simulations; computational cost grows with system size and trajectory length, and results can depend on thermostat/barostat choices and sampling.
- DiffTRe and related reweighting approaches require sufficient configuration overlap; choice of reference states and reweighting strategy affects gradients and is non-trivial. The best reweighting technique remains an open question.
- Experimental fitting used only four temperatures; while good transferability was observed across 4–973 K, broader targets may be needed for other systems or properties.
- Discrepancies among experimental references (e.g., bcc elasticity) complicate evaluation and may limit definitive benchmarking.
Related Publications
Explore these studies to deepen your understanding of the subject.

