
Engineering and Technology
Machine learned interatomic potentials using random features
G. Dhaliwal, P. B. Nair, et al.
Discover a groundbreaking method for modeling interatomic interactions featuring rapid parameter estimation and impressive reductions in training time. This innovative approach, developed by Gurjot Dhaliwal, Prasanth B. Nair, and Chandra Veer Singh, showcases exceptional accuracy in predicting forces and energy, aligning closely with density functional theory results.
~3 min • Beginner • English
Introduction
Classical molecular dynamics relies on an interatomic potential (IP) to describe interatomic interactions, trading some accuracy for computational speed compared to ab initio methods like DFT. Empirical, physically motivated IPs can be sensitive to parameter choices and often lack transferability across materials. Machine-learning (ML) interatomic potentials, including linear and kernel regression and neural networks, leverage atomic descriptors (or learn them) to predict energies and forces with higher flexibility and transferability. Kernel methods such as the Gaussian Approximation Potential (GAP) and AGNI have been successful but suffer from high training and prediction costs and memory demands due to large kernel matrices, which hinders hyperparameter tuning, active learning, and multi-element systems. The present work proposes a scalable alternative that approximates the kernel feature space with low-dimensional random features, enabling linear least-squares training with reduced computational complexity while maintaining DFT-level accuracy.
Literature Review
The paper reviews prevalent ML approaches for IP development: kernel regression (e.g., GAP) and neural networks (including descriptor-learning embeddings). GAP uses Gaussian processes with kernels over atomic descriptors (e.g., SOAP) and has been applied to metals, semiconductors, and amorphous solids; AGNI targets metals. However, standard kernel methods scale poorly: parameter estimation typically costs O(N^3) with O(N^2) memory for N training points, and inference costs O(N), which is prohibitive for training sets up to 10^6–10^7 atomic environments and for MD trajectories. Sparse approximations in GAP alleviate but do not eliminate the bottlenecks. Random feature approximations to kernels (Random Fourier Features for stationary kernels; random feature maps for dot-product/non-stationary kernels) are established in ML but have not been widely applied to IPs. This work applies these techniques to interatomic potentials and compares against state-of-the-art GAP and classical empirical potentials (AIREBO, EAM/Finnis–Sinclair, Tersoff).
Methodology
Modeling approach: The local atomic energy is expressed via kernels over atomic descriptors and approximated using random features.
- Stationary kernels: Use Random Fourier Features (RFF). By Bochner’s theorem, a stationary kernel K(q_l−q) can be approximated by z(q)·z(q_l), where z(q) are cosine/sine random Fourier features: z_m(q) = (1/√M) cos(ω_m^T q) with ω_m sampled from the kernel’s spectral density. The energy is modeled as E(q) = Σ_m α_m z_m(q). Orthogonal random features (O-RFF) are also explored to reduce variance and the number of required features by enforcing orthogonality among sampled ω.
- Non-stationary (dot-product) kernels: Use random feature maps (RFM) based on Maclaurin expansions of analytic dot-product kernels (per Shor’s theorem). Features are constructed to provide unbiased estimates of the kernel via randomized combinations of descriptor components; energy is modeled as E(q) = (M/N) Σ_m β_m z_m(q).
Training objective: Weights α and β are obtained by minimizing a regularized least-squares loss over energies and forces: L(α,β) = Σ (E_i−E_i^*)^2 + Σ (F_k−F_k^*)^2 + λ(||α||^2 + ||β||^2). Forces are computed as analytical derivatives of the energy with respect to atomic positions using the chain rule and descriptor gradients. The optimization reduces to solving linear normal equations. Computational complexity for training is O(MN), much lower than O(N^3) for full kernel methods; prediction is O(M) per evaluation.
Descriptors and cutoffs: The framework is descriptor-agnostic. In this study, two-body and three-body descriptors are used for graphene; SOAP descriptors are used for diamond and tungsten. A smooth cutoff function f_cut(r) zeros contributions beyond a cutoff radius. Descriptor parameters are consistent with prior GAP datasets.
Data and systems: Datasets for graphene, diamond, and tungsten were obtained from libatoms.org (accessed March 3, 2019), with energies and forces computed by DFT as in prior works. Training/test splits follow GAP benchmarks; diamond data were randomly partitioned. For graphene, both two- and three-body descriptors are tested; for diamond and tungsten, SOAP with RFF is employed. Additional Q-RFF models were tested on multi-element Li-Si-P-S and Li-P-S-S datasets from DeepMD.
DFT reference and phonons: Plane-wave DFT (VASP) with GGA-PBE and PAW pseudopotentials, Monkhorst–Pack k-point grid (7×7×7). Phonon dispersions computed via small-displacement method (ASE implementation) with support grid for accurate gradients.
MD implementation: Random-feature IPs (RFF, O-RFF, RFM) implemented in ASE and compared to GAP via QUIP and to classical IPs (AIREBO, Finnis–Sinclair/EAM, Tersoff). Structural (EOS, lattice constant, cohesive energy), elastic (C_ij), and phonon properties were evaluated. Runtime and training time comparisons are reported; O-RFF reduces the number of features and training time substantially.
Algorithms: The paper details Algorithm 1 (RFF) and Algorithm 2 (RFM) for constructing feature maps, including quasi-random sampling to make RFF deterministic, and outlines force expressions for both models.
Key Findings
- Accuracy vs DFT:
- Across graphene, diamond, tungsten, force MAE is on the order of 10^-2 eV Å^-1 relative to DFT.
- Graphene: RMSE in energy ≈ 0.070 eV and out-of-plane force RMSE ≈ 0.044 eV Å^-1 for RFF; R^2 > 0.99 for force predictions. Energy fittings approach ≈ 1 kcal/mol (0.043 eV) accuracy.
- Phonons (graphene): RFF and O-RFF phonon frequencies closely match DFT; maximum absolute frequency error ≈ 7 meV (RFF) and 6 meV (O-RFF). Phonon frequencies are within 0.1% of DFT in summary claims.
- Diamond and tungsten: As number of random features increases, both energy and force MAE decrease and fall below GAP benchmarks for force. EOS fits (RFM vs GAP) are nearly identical. Lattice constants predicted by RFM are within ≈0.01% of DFT for both materials; bulk moduli close to DFT.
- Comparison to GAP and classical IPs:
- Training time reduced by ≈96% versus GAP with orthogonal random features (O-RFF), and model size (number of parameters) reduced by an order of magnitude while maintaining accuracy.
- Graphene (Table 1, MAE):
- GAP (2B): MAE(E)=52.705 meV/atom; MAE(F)=0.234 eV Å^-1; Parameters=50; Training time=144 s.
- GAP (3B): MAE(E)=28.945 meV/atom; MAE(F)=0.325 eV Å^-1; Parameters=250; Training time=103 s. (table contains duplicated/corrupted rows; authors report RFF and O-RFF significantly improve energy MAE and comparable force MAE, with large training-time reduction)
- RFF (3B): MAE(E)=0.415 meV/atom; MAE(F)=0.449 eV Å^-1; Parameters=400; Training time=4229 s.
- O-RFF (3B): MAE(E)=0.29 meV/atom; MAE(F)=0.047 eV Å^-1; Parameters=200; Training time=256 s.
- Diamond forces (Table 2): RFM MAE train/test = 0.022/0.024 eV Å^-1; far lower errors than classical IPs (AIREBO, AIREBO-M, Tersoff) and reported lower than GAP.
- MD property predictions:
- Graphene: RFF cohesive energy and lattice constant close to GAP and DFT; cohesive energy within ≈1% of DFT; AIREBO deviates by ≈4% for cohesive energy. Elastic constants: RFF C11 within ≈12% of GAP; C12 deviation ≈12% from GAP and ≈8% from experiment; O-RFF lattice constant and cohesive energy agree closely with RFF, C11 within 2% and C12 within 3.2% of RFF.
- Diamond and tungsten: Elastic constants C11, C12, C44 near DFT; for tungsten, C12 slightly overpredicted (≈225 GPa vs DFT ≈204 GPa). Mechanical properties (Young’s modulus, Poisson’s ratio) agree well with DFT.
- Defect properties (tungsten): Vacancy formation energies accurately reproduced, comparable to DFT.
- Multi-element systems: Q-RFF potentials for Li-Si-P-S and Li-P-S-S achieve force MAE ≈ 0.008 eV Å^-1 versus DFT.
- Efficiency and scalability:
- Training complexity O(MN) and inference O(M), avoiding storage of training configurations at prediction time; large reduction in training time (≈96% vs GAP). O-RFF further reduces required features and runtime. Runtime is still slower than classical IPs but faster than GAP and RFF variants when using O-RFF.
Discussion
The proposed random-feature-based linear models address the scalability bottlenecks of kernel IPs while preserving high fidelity to DFT reference data. By projecting infinite-dimensional kernel feature spaces into low-dimensional randomized bases (RFF for stationary kernels and RFM for dot-product kernels), the models enable efficient linear least-squares training and O(M) inference, reducing training time by about 96% compared to GAP and facilitating hyperparameter tuning and active learning. Across diverse materials (graphene, diamond, tungsten), the approach achieves force MAE ~10^-2 eV Å^-1 and reproduces structural, elastic, and phonon properties near DFT and experimental values. O-RFF reduces variance and the number of features, improving runtime further with minimal loss of accuracy. The ability to combine different descriptor classes (two-body, three-body, SOAP, etc.) and to extend to multi-component materials (Q-RFF) demonstrates flexibility and transferability. Overall, the findings support the hypothesis that linear models with random features can replace costly kernel evaluations in ML interatomic potentials without sacrificing accuracy, thereby enabling scalable, high-throughput IP development and MD simulations.
Conclusion
This work introduces interatomic potentials based on randomized feature maps for stationary and non-stationary kernels, delivering DFT-level accuracy with substantially reduced computational cost relative to standard kernel methods. The approach:
- Provides accurate energies, forces, and phonon spectra for graphene, diamond, and tungsten, and generalizes to multi-element systems (Q-RFF) with very low force MAE.
- Reduces training time by ≈96% vs GAP and enables efficient prediction without storing training data.
- Demonstrates strong agreement with DFT/experiment for structural (lattice constants, cohesive energies), elastic (C_ij, Young’s modulus), and defect properties.
Future directions include: incorporating sparsity-inducing regularizers and Kronecker algebra to further cut runtime; optimizing feature constructions and orthogonality schemes; integrating additional descriptor sets; and extending active-learning workflows for complex, multi-element materials.
Limitations
- Runtime during MD remains slower than classical empirical IPs due to expensive trigonometric evaluations in RFF; although O-RFF reduces the number of features and improves speed, it can still be up to two orders of magnitude slower than classical IPs.
- RFF variants can require more parameters than sparse kernel methods, impacting prediction speed; orthogonal features mitigate but do not eliminate this.
- Some material properties (e.g., certain elastic constants like C12) show noticeable deviations from DFT/experiment, indicating sensitivity to training data coverage (e.g., strained configurations).
- Reported observations note that energy and force errors can grow with increasing training sample size in some settings, suggesting careful curation/weighting of training data and regularization is important.
- Affinity to chosen descriptors and hyperparameters (cutoffs, kernel length scales) remains; performance depends on descriptor quality and coverage.
Related Publications
Explore these studies to deepen your understanding of the subject.