logo
Loading...
Completing density functional theory by machine learning hidden messages from molecules

Physics

Completing density functional theory by machine learning hidden messages from molecules

R. Nagai, R. Akashi, et al.

This groundbreaking research by Ryo Nagai, Ryosuke Akashi, and Osamu Sugino reveals a novel method for constructing the exchange-correlation energy functional in Kohn-Sham DFT using machine learning. Their approach surprisingly offers high accuracy across numerous molecules, on par with traditional functionals, enhancing the capabilities of DFT.... show more
Introduction

Machine learning (ML) can learn complex mappings from sampled data and has proven effective across materials problems, often predicting properties from atomic configurations. Such models, however, may lack transferability when applied to structures or elements outside the training set. In contrast, ML schemes that use electron density show greater transferability because density encodes richer physical information than scalar properties like total energy. Kohn-Sham (KS) DFT is the standard method for electronic structure, where the unknown exchange-correlation potential is ideally a functional of the full density distribution, but its exact form is unknown. Existing approximations (LSDA, GGA, meta-GGA, hybrids) follow Jacob’s ladder, yet still face transferability issues and potential bias towards energy over density accuracy. Given abundant accurate densities and energies from theory and experiment, the authors propose constructing Vxc by ML within the KS framework, leveraging electron density as input for improved generalization. They target direct ML of Vxc, building on prior demonstrations in model systems, and aim to make this feasible for real materials.

Literature Review

The paper situates its contribution within several strands of prior work: (1) ML in materials science for property prediction and ML-based interatomic potentials, which can suffer from limited transferability beyond training sets. (2) ML approaches that use electron density as input have demonstrated superior transferability, motivating density-based ML models. (3) Earlier pioneering work by Burke and co-workers constructed machine-learned Hohenberg-Kohn functionals for orbital-free DFT, differing from the present KS-based approach targeting Vxc. (4) Theoretical context of Jacob’s ladder (LSDA, GGA, meta-GGA, hybrids) notes persisting transferability issues and trade-offs between energy and density accuracy. (5) The authors’ previous work machine-learned Vxc in a 1D two-body model using exact diagonalization data and showed that explicit kinetic energy treatment suppresses spurious oscillations in learned Vxc and improves density predictions, motivating the present extension to real molecular systems.

Methodology

Overview: The authors construct exchange-correlation energy densities using a feed-forward neural network (NN) in a (semi-)local form suitable for KS-DFT, enabling evaluation of the functional derivative via back-propagation to obtain Vxcn for self-consistent KS calculations. Functional form: The xc energy is written as E_xc[n] = ∫ dr n(r) E_xc^{g[n]}(r), where the input descriptor vector gn contains local and near-local density descriptors. Four approximation levels are implemented via the choice of g: LSDA, GGA, meta-GGA, and a near region approximation (NRA) that augments meta-GGA with a nonlocal descriptor R(r) = ∫ dr' n(r') exp(-|r - r'|/σ), with σ = 0.2 bohr. This nonlocal term captures averaged density around r, reflecting nonlocal exchange-correlation effects. Neural network mapping: A fully connected feed-forward NN with H hidden layers maps u = gn to v = E_xc(r). Each layer applies an affine transform followed by a nonlinear activation (exponential linear unit). The xc energy density is formulated as E_xc(n, g) = - n^{1/3} [1 + (2/3)(1 - ζ) f_NN(g)], embedding minimal physical constraints (Slater exchange scaling and spin polarization dependence) and learning a correction term via the NN G_NN(g) = 1 + h4(...h1(g)...). The last layer is constrained to keep E_xc nonpositive. Inputs are preprocessed to dimensionless, variance-regularized forms: n → log n; ζ → −log{ sqrt(1+ζ) + sqrt(1−ζ) }; s → log s; r-like variables → log r. Descriptors: LSDA uses n and spin polarization ζ; GGA adds reduced gradient s; meta-GGA adds kinetic-energy-density-related descriptor τ and other standard meta-GGA variables; NRA extends meta-GGA with the nonlocal R(r). The functional derivative δE_xc/δn(r) is computed via back-propagation, and for NRA includes a domain integral over r' due to the nonlocal R(r), evaluated on the same numerical grid as xc integration (cost scales quadratically with system size for this step). Training data: Three reference molecules were selected to diversify structure, polarity, and spin: H2O, NH3, and spin-polarized NO. Training targets are atomization energies (AE) from Gaussian-2 (G2) and electron density distributions (DD) from CCSD, computed with the 6-311++G(3df,3pd) basis set. Atomization energies are used instead of total energies because AE errors are smaller and benefit from error cancellation within (semi-)local approximations. Optimization: NN parameters are trained via a Metropolis-type Monte Carlo update scheme. At each iteration, proposed random perturbations to NN weights are applied; KS-DFT calculations for the three molecules and their constituent atoms (for AE) are performed self-consistently with the current NN functional; a cost function is evaluated and used in an acceptance criterion. The cost combines squared deviations of AE from G2 (in hartree, normalized by E0=1 hartree) and density errors relative to CCSD, with weighting c2/c1=10. The density error metric for molecule M is (1/Ne) ∫ dr | n_DFT(r) − n_CCSD(r) |^2. Training proceeds for ~300 iterations while linearly decreasing the proposal magnitude and temperature (initial T=0.1, δw=0.01 to final T=0.06, δw=0.005), in parallel over 160 threads, selecting parameters minimizing the cost. Implementation details: DFT and CCSD calculations used PySCF (v1.6.2) with standard numerical integration grids (Lebedev angular, Treutler radial) and Becke partitioning. Angular/radial grid sizes varied by element (e.g., (50,302) for H; (75,302) for second-row; up to (105,434) for third-row). NN implementation used PyTorch with back-propagation for derivatives. Typical NN size for meta-GGA: 4 hidden layers with 100 units each (matrices W1: 100×N, W2, W3: 100×100, W4: 1×100; biases b1–b3: 100, b4: 1). Network-size dependence was explored (H=3, Nh=50; H=4, Nh=100; H=5, Nh=200), showing diminishing returns beyond H=4, Nh=100. Self-consistency considerations: Because NN-based functionals can extrapolate poorly far from training densities, initial KS density guesses are important; superposition of atomic densities was used to ensure convergence.

Key Findings
  • Broad benchmark performance: Across hundreds of unreferenced molecular systems and properties (AE, density distribution, total energy, ionization potentials, and reaction barrier heights), NN-based functionals matched or exceeded representative LSDA, GGA, meta-GGA, and even hybrid functionals.
  • Quantitative highlights (mean absolute errors):
    • NN-NRA: AE147 3.7 kcal/mol; AEHC28 2.2 kcal/mol; DD147 0.0011; TE147 0.08 hartree; IP13 1.5 kcal/mol; BH76 5.5 kcal/mol.
    • NN–meta-GGA: AE147 4.7 kcal/mol; AEHC28 3.5 kcal/mol; DD147 0.0011; TE147 0.14 hartree; IP13 1.8 kcal/mol; BH76 4.7 kcal/mol.
    • For comparison: SCAN (meta-GGA): AE147 6.1; AEHC28 6.9; DD147 0.0014; TE147 0.28; IP13 3.7; BH76 7.7. B3LYP (hybrid): AE147 4.5; AEHC28 2.4; DD147 0.0015; TE147 0.36; IP13 3.8; BH76 4.7. PBE0 (hybrid): AE147 5.3; AEHC28 9.0; DD147 0.0011; TE147 0.23; IP13 3.2; BH76 5.0.
  • Nonlocality via NRA: Adding the simple nonlocal descriptor R(r) yields performance comparable to hybrids for many benchmarks without explicit Hartree-Fock exchange.
  • Transferability: Despite training on only H2O, NH3, and NO (no carbon), the functionals perform well on hydrocarbons (AEHC28) and systems with delocalized electrons (e.g., benzene, butadiene), with errors decreasing as descriptor richness increases (LSDA → GGA → meta-GGA → NRA). This indicates the NN learns to distinguish localized vs. delocalized electron environments.
  • Density–energy link: Improved density accuracy (DD147) correlates with improved energies (AE147, TE147) and IPs, consistent with the Hohenberg–Kohn theorem and known relationships between accurate densities, potentials, and frontier orbital energies.
  • LSDA improvement: NN-LSDA significantly outperforms SVWN LSDA, effectively representing an LSDA calibrated to molecular systems. As descriptor dimensionality increases, the multivaluedness of g→E_xc diminishes, improving accuracy.
  • Out-of-training structures: NN–meta-GGA reproduces dissociation curves for C2H2 and N2 (bond breaking) and associated density transformations, even though training included only equilibrium structures.
  • Network size: Increasing NN size improves performance up to a point (H=4, Nh=100), after which gains saturate, guiding a practical model size.
Discussion

The results demonstrate that incorporating electron density as the ML input within the KS framework yields strong transferability from a minimal training set to diverse molecules and properties. By explicitly solving the KS equations, the kinetic energy operator regularizes potential artifacts from ML-predicted Vxc, enhancing robustness for out-of-training configurations (e.g., bond dissociation). The systematic enrichment of local descriptors (from LSDA to GGA to meta-GGA and adding nonlocal R) reduces the ambiguity in mapping local environments to E_xc, producing consistent improvements in both density and energetic properties. The NRA formulation shows that modest nonlocality introduced via a simple density average can approach hybrid-level accuracy without incurring the cost of Hartree-Fock exchange, maintaining the explicit density-functional form and scalability of semilocal DFT. The observed correlation between density accuracy and energetic observables (TE, AE, IP) confirms that training on density is not only useful for fitting many NN parameters but also directly beneficial for predictive accuracy across properties. Cases like SiH4 and CCl4 suggest that further expanding the training set to include underrepresented bonding motifs (e.g., tetrahedral coordination) can systematically close remaining gaps. Overall, the approach provides a data-driven path to improve DFT functionals while retaining physical constraints and computational tractability.

Conclusion

The study presents a machine-learning approach to construct exchange-correlation functionals for KS-DFT by mapping local and near-local electron-density descriptors to xc energy density using neural networks and training on accurate densities and atomization energies of a few reference molecules. The resulting functionals, including a novel NRA with a simple nonlocal descriptor, achieve accuracy comparable to or better than widely used semilocal and hybrid functionals across large molecular benchmarks and untrained properties and structures. The framework enables systematic improvement by adding descriptors and training data with minimal ad hoc assumptions, preserving the KS equation’s computational efficiency. Future work can target systems dominated by complex nonlocal effects—dispersion (van der Waals), self-interaction errors (range-separated hybrids), and strong correlations (DFT+U)—leveraging the flexibility of the ML-based functional form to incorporate appropriate nonlocal descriptors and training data.

Limitations
  • Training set size and coverage: Only three molecules (H2O, NH3, NO) were used for training; certain bonding motifs and geometries (e.g., tetrahedral SiH4, CCl4) are underrepresented, leading to reduced accuracy. Accuracy dependence on training set composition remains to be systematically explored.
  • Convergence and extrapolation: NN-based functionals may exhibit convergence issues when applied to densities far from training distributions; reliable self-consistent convergence can require careful initial density guesses (e.g., superposition of atomic densities).
  • Computational scaling for nonlocal term: The NRA’s nonlocal descriptor introduces an O(N_grid^2) step in evaluating Vxc due to spatial integrals, increasing computational cost with system size compared to purely semilocal forms.
  • Total energy within semilocal forms: Reproducing absolute total energies is intrinsically more challenging for semilocal approximations; while TE accuracy improved, AE was preferred for training due to larger typical TE errors in standard functionals.
  • Physical constraints: Only minimal physical conditions were embedded; additional exact constraints and norming strategies could further enhance robustness and interpretability.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny