Introduction
Rising atmospheric CO2 levels, exceeding 420 ppm in June 2022, necessitate effective carbon capture and storage (CCS) technologies. While MOFs show promise due to their tunability and high CO2 sorption capacity, evaluating the vast MOF space is computationally expensive. Direct air capture (DAC) presents a unique challenge with low CO2 partial pressures and high water vapor partial pressures, requiring adsorbents to exhibit high CO2 selectivity over water. Machine learning (ML) offers a potential solution to accelerate the screening process by predicting CO2 adsorption based on computationally generated data. Previous studies have shown the effectiveness of ML in predicting gas adsorption properties in MOFs, but further improvements are needed to efficiently predict CO2 capture performance under DAC conditions. This study addresses this gap by developing new descriptors to improve ML model accuracy and efficiency in predicting CO2 adsorption at low partial pressures.
Literature Review
Several studies have demonstrated the potential of machine learning for predicting gas adsorption and separation properties in MOFs. Aghaji et al. used a classification model to identify MOFs for methane purification based on geometric descriptors, achieving high accuracy. Anderson et al. computationally constructed 400 MOFs and used DFT calculations and GCMC simulations, combined with ML models, to predict CO2 capture metrics, reaching R² values as high as 0.905. However, these studies primarily focused on higher CO2 partial pressures, leaving a gap in predicting performance under DAC conditions where low partial pressures and high humidity are significant challenges. This study aims to bridge this gap by developing and evaluating an ML model tailored for low-pressure CO2 capture.
Methodology
This research utilized a dataset comprising 3378 structures from the CORE MOF dataset and 936 from the Anion-pillared MOF dataset. Partial charges were calculated using DFT and the DDEC method. Grand canonical Monte Carlo (GCMC) simulations using the RASPA package were employed to obtain the target variable (CO2 uptake) at three CO2 partial pressures (40 Pa, 1 kPa, and 4 kPa). Five groups of descriptors were used: atom type, geometric, chemical, Effective Point Charge (EPoCh), and energy (Henry coefficient). Atom type, geometric, and chemical descriptors have been used previously in ML models for similar properties, representing established features. The newly developed EPoCh descriptor aims to quantify the effect of atomic partial charges on CO2 uptake by simulating adsorption around a single hypothetical atom with varying charges and pressures, fitting a surface to these results and averaging the calculated uptake over all atoms in the MOF structure. The Henry coefficient, representing gas-adsorbent interaction strength, was also included as a descriptor. A Random Forest (RF) algorithm was used as the ML model and was evaluated using R² and RMSE. The model’s pseudo-classification performance was also assessed using recall and precision to evaluate its ability to identify MOFs exceeding a specified CO2 uptake threshold. To assess the computational efficiency of different descriptors, the time required to compute each descriptor for the anion-pillared MOF subset was measured.
Key Findings
The study revealed that the Henry coefficient is the most influential descriptor in predicting CO2 uptake, with an R² exceeding 0.9 when incorporated into the ML model across all pressures. However, the computationally intensive nature of obtaining the Henry coefficient motivates exploration of alternatives. The EPoCh descriptors, designed to capture the electrostatic interactions between MOF partial charges and CO2, proved to be the second most important descriptor group. Remarkably, the EPoCh descriptors are hundreds of thousands of times faster to compute than the Henry coefficient. When considering only the EPoCh descriptors, along with atom type and geometric descriptors, the ML model still exhibits good performance, with R² values between 0.69 and 0.74. The study demonstrated that the ML model, even without the Henry coefficient, can effectively identify high-performing CO2 capture candidates (defined as having an uptake ≥ 1 mmol g⁻¹) with high recall and precision values. For example, at 40 Pa, the recall reached 0.719 indicating that about 72% of actual positive candidates were correctly identified. The precision was 0.807 implying that about 81% of those predicted as positive candidates were actually positive. Importantly, even when omitting the computationally expensive Henry coefficient, the model effectively identifies high-performing MOFs for CO2 capture. The analysis of computation time revealed that the EPoCh descriptors drastically reduce computational requirements compared to calculating Henry coefficients, making them advantageous for high-throughput screening. The time-weighted performance analysis further solidifies the advantage of EPoCh, surpassing Henry-based models by over 450 times in terms of weighted R². Hydrophobicity, crucial in DAC, was also analyzed. Selecting MOFs with a higher CO2 Henry coefficient compared to the H2O Henry coefficient revealed a subset of MOFs with high CO2 uptake.
Discussion
The findings highlight the efficacy of ML in predicting CO2 adsorption in MOFs, especially under the challenging conditions of DAC. The introduction of EPoCh descriptors successfully accelerates the screening process without sacrificing predictive accuracy significantly. The balance between model accuracy and computational cost is a critical aspect of high-throughput screening, and the EPoCh descriptors offer a favorable compromise. The high recall and precision demonstrate that the model can effectively identify promising MOF candidates for CO2 capture, even without the Henry coefficient. While the Henry coefficient provides a comprehensive measure of gas-adsorbent interactions, the EPoCh descriptors capture crucial electrostatic contributions, significantly enhancing the model's performance. The exploration of hydrophobicity highlights the importance of considering moisture effects in selecting MOFs for DAC applications.
Conclusion
This study demonstrates that machine learning, particularly using the newly introduced EPoCh descriptors, can effectively and efficiently predict CO2 capture in MOFs at low partial pressures. The EPoCh descriptors offer a significant computational advantage over the Henry coefficient while retaining high predictive accuracy. Future research can expand the dataset to include functional group descriptors and directly model H2O uptake to improve the identification of hydrophobic candidates. The model’s ability to predict CO2 adsorption at pressures beyond the training range warrants further investigation and refinement. This work significantly advances the computational design and screening of MOFs for direct air capture applications.
Limitations
The accuracy of the EPoCh descriptor depends on the accuracy of the underlying DFT calculations for partial charges. The study focused on three specific CO2 partial pressures and did not explore the full isotherm. The hydrophobicity analysis did not explicitly incorporate the effects of water in the GCMC simulations and solely relied on Henry coefficients. Future work should incorporate these aspects to provide a more comprehensive assessment.
Related Publications
Explore these studies to deepen your understanding of the subject.