Introduction
The design of efficient and cost-effective heterogeneous catalysts is crucial for meeting growing energy demands while addressing climate change. Computational catalysis offers a powerful tool for high-throughput screening of materials, complementing experimental studies. A central task in computational catalysis is the accurate calculation of adsorption energies, which represent the energy associated with an adsorbate interacting with a catalyst surface. Adsorption energies are vital for determining reaction pathways and serve as powerful descriptors correlating with experimental outcomes such as activity and selectivity. Calculating adsorption energy involves finding the global minimum energy across all possible adsorbate placements and configurations, a computationally expensive task. Traditional methods rely on Density Functional Theory (DFT) calculations, which are computationally expensive, scaling O(N³) with the number of electrons. Finding the global minimum typically requires exploring numerous configurations using heuristic approaches or intuition, which doesn't scale well for high-throughput screening. This paper proposes using machine learning (ML) potentials to accelerate the search for low-energy adsorbate-surface configurations, leveraging the strengths of both ML and DFT in a hybrid approach.
Literature Review
Prior work has relied on expert intuition or heuristics, such as those based on surface symmetry, to identify low-energy adsorbate-surface configurations. While these methods have proven successful in some cases, they don't scale well with increasing complexity of surfaces and adsorbates. Graph-based methods have also been explored to identify unique configurations. However, the computational cost of DFT remains a significant bottleneck. Recent advancements in ML potentials offer a promising solution, enabling orders-of-magnitude speedups compared to DFT. Existing ML potentials have shown progress on standard benchmarks, but achieving the desired accuracy for accurate screening remains a challenge. Previous attempts to use ML for accelerating the search for low-energy configurations have often relied on bespoke models for each adsorbate/catalyst combination, limiting broader applicability. This paper addresses these limitations by employing generalizable ML potentials.
Methodology
The authors introduce the AdsorbML algorithm, a hybrid approach combining ML potentials and DFT calculations to estimate adsorption energies. The algorithm first generates a large number of potential adsorbate configurations using heuristic and random strategies. ML potentials are then used to perform relaxations on these configurations. The best k configurations (those with the lowest energies) are selected for further refinement using either DFT single-point calculations (ML+SP) or full DFT relaxations (ML+RX). The final adsorption energy is determined by taking the minimum energy among the best k DFT calculations. To benchmark different methods, the authors introduce the Open Catalyst 2020-Dense (OC20-Dense) dataset, which contains densely sampled configurations for ~1000 unique adsorbate-surface combinations. The dataset was generated using DFT calculations, requiring ~4 million CPU hours. The performance of several Graph Neural Network (GNN) models, previously benchmarked on the Open Catalyst 2020 (OC20) dataset, was evaluated on OC20-Dense using AdsorbML. The evaluation metrics included success rate (percentage of systems where the predicted adsorption energy is within 0.1 eV of the DFT minimum) and DFT speedup (ratio of DFT electronic steps used by DFT-Heuristic+Random to the hybrid ML+DFT strategy). Relaxation constraints were implemented to ensure the validity of the adsorption energies, filtering out configurations involving desorption, dissociation, or significant surface mismatches.
Key Findings
The evaluation of various GNN models on OC20-Dense revealed that eSCN-MD-Large and GemNet-OC-MD-Large achieved the highest success rates. The AdsorbML algorithm, using the ML+SP strategy (single-point DFT calculations on ML-relaxed structures), demonstrated impressive results. eSCN-MD-Large achieved a success rate of 88.27% at k=5, slightly exceeding the DFT-Heuristic baseline, with a speedup of 1384x. A balanced trade-off (k=3) yielded a success rate of 87.36% and a speedup of 2296x. The ML+RX strategy (full DFT relaxations from ML-relaxed structures) resulted in even higher success rates (e.g., 90.60% for eSCN-MD-Large at k=5), albeit with reduced speedups (215x). Analysis of the distribution of predictions showed that the most accurate models didn't necessarily find significantly lower minima than DFT, suggesting that noise in ML predictions can be beneficial for exploring unexplored regions of the potential energy surface. The performance consistency across different dataset subsplits (ID, OOD-Adsorbate, OOD-Catalyst, OOD-Both) indicated good generalization capabilities. Experiments also revealed that incorporating random configurations in addition to heuristic ones significantly improved the success rate, highlighting the importance of diverse sampling.
Discussion
The findings demonstrate that AdsorbML provides a spectrum of accuracy-efficiency trade-offs for adsorption energy calculations. The algorithm significantly accelerates DFT calculations while maintaining high accuracy, making it suitable for high-throughput screening applications. For instance, at a given computational budget, AdsorbML can enable screening thousands of times more materials than DFT alone. The observation that ML models sometimes find lower minima than DFT suggests that noise in ML predictions can aid in exploring potentially advantageous configurations. The consistent performance across different dataset splits underscores the algorithm's generalizability, extending its potential beyond the specific systems included in the training dataset.
Conclusion
This work presents AdsorbML, a highly efficient hybrid algorithm for computing adsorption energies, and the OC20-Dense dataset for benchmarking. AdsorbML offers various accuracy-efficiency trade-offs. Future research directions include exploring more efficient global optimization methods and further improving the accuracy of ML potentials. The release of the OC20-Dense dataset and evaluation server will further facilitate progress in this area.
Limitations
The current study's enumeration of configurations, while more extensive than traditional heuristic methods, is not exhaustive. The success rate metric can be potentially manipulated by only predicting low energies if DFT single-point verification is not employed. While the models used show promise for idealized adsorbate-surface catalysts, fine-tuning is needed to expand the applicability to other systems and levels of theory.
Related Publications
Explore these studies to deepen your understanding of the subject.