logo
ResearchBunny Logo
Coupled cluster finite temperature simulations of periodic materials via machine learning

Chemistry

Coupled cluster finite temperature simulations of periodic materials via machine learning

B. Herzog, A. Gallo, et al.

Dive into groundbreaking research by Basile Herzog, Alejandro Gallo, and their colleagues, showcasing a cutting-edge method for finite-temperature coupled cluster simulations of periodic materials. By integrating machine learning with traditional chemistry, they unveil a more efficient approach to predicting thermodynamic properties like CO2 adsorption in zeolites, achieving remarkable accuracy against experimental data. Don't miss out on the future of computational chemistry!... show more
Introduction

The study addresses the challenge of obtaining reliable finite-temperature properties of periodic materials at high electronic-structure accuracy. While DFT is the workhorse for materials modeling, its predictions depend strongly on the chosen exchange–correlation functional. High-accuracy post-Hartree–Fock methods such as MP2 and CCSD(T) can deliver chemical accuracy but are prohibitively expensive for periodic systems and for MD-based finite-temperature sampling. The research question is whether ML-assisted strategies can make periodic CCSD(T) feasible for finite-temperature observables. The authors propose combining ML with thermodynamic perturbation theory and MC sampling to evaluate ensemble averages at MP2 and CCSD(T) accuracy using trajectories generated at a cheaper DFT level, and they test this on the enthalpy of adsorption of CO2 in protonated chabazite.

Literature Review

Recent advances have implemented MP2 and coupled cluster methods for periodic materials, but costs remain high, especially for finite-temperature sampling. ML-accelerated MD has enabled larger systems and longer time scales by learning interatomic potentials, yet typically requires large training datasets and becomes challenging at high levels of theory. Prior ML-assisted CCSD(T) work focused mainly on molecular systems and, more recently, small periodic boxes for liquid water, with limited data and scope. There has been no prior report of finite-temperature CCSD(T) applications to periodic solids such as zeolites. Earlier studies also introduced ML-based thermodynamic perturbation (MLPT) for reweighting DFT trajectories to higher-level theories (e.g., RPA) efficiently. This work extends these ideas to MP2 and CCSD(T) for a periodic zeolite adsorption problem, emphasizing data efficiency and reliability checks for configurational space overlap.

Methodology
  • Target property: Enthalpy of adsorption of CO2 in protonated chabazite at T = 300 K, computed as ΔH_ads = ⟨E(M@zeolite)⟩ − (⟨E(M)⟩ + ⟨E(zeolite)⟩) − k_B T.
  • Reference (production) simulations: Ab initio MD at the PBE + D2 level in the NVT ensemble at 300 K with Andersen thermostat (collision probability 0.05), 0.5 fs timestep, 100 ps total length (200,000 configurations), first 10 ps discarded for equilibration; fixed cell parameters optimized at PBE level; VASP used for all AIMD and PBE+D2 single-point calculations; H masses set to 1.
  • High-level electronic structure: Periodic coupled cluster using CC4s interfaced with VASP. Workflow employs plane-wave basis with techniques for finite-basis and finite-size corrections, natural orbitals optimized via HF/MP2, followed by CCSD and perturbative (T) steps. For CCSD, 10 approximate natural orbitals per occupied orbital; for (T), 5 natural amplitudes. A single CCSD(T) calculation costs ~10,000 core-hours.
  • MLPT (Machine Learning Thermodynamic Perturbation):
    1. Generate a production trajectory with Hamiltonian H0 (PBE + D2) yielding energies E0(R_i).
    2. Reweight to target Hamiltonian HT (MP2 or CCSD(T)) using TPT: ensemble averages are computed by Σ_i w_i E1(R_i) / Σ_i w_i with weights w_i ∝ exp[−β E0(R_i)]; practically, ML predicts ΔE(R) = E_target(R) − E_ref(R) for many configurations, reducing the number of expensive target calculations.
    3. Kernel ridge regression with SOAP descriptors (DScribe) is trained on differences between post-HF (MP2 or CCSD(T)) and PBE + D2 energies. Training set: 10,000 configurations evenly spaced along PBE + D2 trajectories; test set: 10 randomly chosen configurations. Hyperparameter tuning and accuracy in SI.
  • MLMC (Machine Learning Monte Carlo): To address potential limited overlap between production and target configurational spaces, perform Metropolis MC sampling directly in the target (CCSD(T)) canonical ensemble, replacing expensive CCSD(T) energies with ML predictions from the MLPT-trained model. Proposals include random translations (up to 0.5 Å) and rotations (up to 75°) for the adsorbed molecule; acceptance via Metropolis criterion based on ML-predicted energies. MLMC avoids TPT reweighting bias; however, it exhibits longer autocorrelation times than MLPT.
  • Overlap diagnostics: Use the Iw index to quantify configurational space overlap; values near 0.5 indicate optimal overlap, while small values indicate risk. Prior work suggests Iw ~ 0.03 can be sufficient for adsorption in zeolites. Reported values around 0.07 (host) and 0.05 (adsorbate) at CCSD(T) indicate acceptable overlap.
  • Validation analyses: t-SNE projections show comparable configurational spaces sampled by PBE + D2 MD (production) and CCSD(T) MLMC (target), with training data covering the relevant region. Radial distribution functions (Si–O pairs) are similar across PBE + D2, MP2, and CCSD(T) (via MLPT and MLMC), indicating no severe structural bias.
  • Static correction contrast: A simplified “static shift” approach assuming parallel energy surfaces (constant offset between DFT and post-HF) is contrasted with MLPT; it is shown to be unreliable generally and to mask important differences between MP2 and CCSD(T).
Key Findings
  • Enthalpy of adsorption of CO2 in protonated chabazite at 300 K:
    • PBE + D2 (MD sampling): −9.72 ± 0.27 kcal mol⁻1
    • MP2 (MLPT): −9.50 ± 0.24 kcal mol⁻1
    • CCSD(T) (MLPT): −8.32 ± 0.28 kcal mol⁻1
    • CCSD(T) (MLMC sampling): −8.09 ± 0.71 kcal mol⁻1
    • Experiment: −8.41 kcal mol⁻1
  • CCSD(T) results (both MLPT and MLMC) are in excellent agreement with experiment; MLMC confirms MLPT and indicates that the PBE + D2 trajectory provides a reliable starting point for reweighting.
  • The difference between MP2 and CCSD(T) enthalpies is ~1.2 kcal mol⁻1, which a static correction approach would miss, underscoring the need for MLPT/MLMC rather than simple energy shifts.
  • Overlap metrics (Iw) and structural analyses (RDFs) support the reliability of MLPT for this system; MLMC cross-validation reduces concerns about reweighting bias.
  • Computational efficiency: The approach requires only tens of high-level single-point calculations to train data-efficient ML models, compared to infeasible brute-force CCSD(T) MD that would demand billions of CPU hours.
Discussion

The results show that ML-accelerated CCSD(T) and MP2 can provide accurate finite-temperature adsorption enthalpies for periodic materials, addressing the core question of feasibility and reliability. The agreement of CCSD(T) with experiment and the consistency between MLPT and MLMC indicate that the PBE + D2 production trajectory sufficiently overlaps with the target CCSD(T) ensemble for this system. Diagnostics (Iw), t-SNE projections, and RDFs corroborate that configurational spaces are comparable and that MLPT reweighting is sound. Moreover, the observed ~1.2 kcal mol⁻1 difference between MP2 and CCSD(T) highlights non-trivial surface deformations that would be missed by static energy shifts, demonstrating the importance of ensemble-aware corrections. While MLMC exhibits larger statistical uncertainty due to longer autocorrelation times, it validates MLPT and removes reweighting bias, thereby reinforcing confidence in the CCSD(T) prediction. Overall, integrating ML with TPT and MC sampling enables practical evaluation of finite-temperature observables at correlated wavefunction accuracy in periodic systems.

Conclusion

This work demonstrates a practical route to compute finite-temperature properties of periodic materials at CCSD(T) accuracy by combining an efficient periodic coupled-cluster implementation with data-efficient ML models, thermodynamic perturbation theory (MLPT), and ML-driven Monte Carlo sampling (MLMC). Applied to CO2 adsorption in protonated chabazite, the CCSD(T) enthalpy of adsorption agrees very well with experiment, and MLMC corroborates MLPT-based predictions. The approach drastically reduces the number of expensive post-HF calculations required and opens avenues for broader use of high-accuracy methods in materials simulations. Future work will extend these techniques to other materials and properties, including free energies of activation for catalytic reactions and more complex adsorption and reaction processes, while further improving sampling efficiency and force availability at the post-HF level.

Limitations
  • Despite major cost reductions, the workflow remains significantly more expensive than standard DFT-based simulations.
  • MLPT accuracy depends on sufficient overlap between production (DFT) and target (post-HF) configurational spaces; insufficient overlap can bias results and demands diagnostics (Iw) or resorting to MLMC.
  • MLMC removes TPT bias but suffers from longer autocorrelation times, leading to larger statistical uncertainties for a given sampling length.
  • Current CCSD(T) implementations may lack readily available forces, preventing ML models trained directly on forces and complicating fully dynamical CCSD(T) simulations.
  • Study focuses on a relatively small periodic system (up to ~40 atoms in the unit cell) as a proof of principle; broader generalization to larger or more complex materials will require further validation and optimization.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny