logo
ResearchBunny Logo
Introduction
Metal-organic frameworks (MOFs) are crystalline porous materials with modular structures, composed of inorganic nodes, organic nodes, and organic linkers. Their diverse properties make them attractive for various applications, including gas adsorption and storage, catalysis, and drug delivery. MOFs are particularly promising for CO2 capture, exhibiting superior adsorption properties compared to many other materials. However, industrial applications are hindered by stability issues such as poor long-term recyclability and high moisture sensitivity. The vast chemical space of potential building blocks makes an exhaustive experimental search impractical. This research addresses this challenge by developing a high-throughput computational framework, GHP-MOFassemble, to accelerate the discovery of novel MOF structures with high CO2 capture capabilities and synthesizable linkers. This framework leverages the power of generative AI to explore a significantly larger design space than traditional experimental or database-driven methods, offering a new approach towards the rational design of next-generation MOFs for industrial CO2 capture.
Literature Review
Previous approaches to discovering high-performing MOFs for gas adsorption have primarily relied on database search methods or machine learning (ML)-assisted screening. Database search methods filter existing MOF databases based on calculated properties from molecular simulations to identify optimal candidates. ML-assisted screening uses regression models trained on a smaller dataset to predict the properties of a larger set of MOFs, reducing the computational cost of simulations. These methods typically utilize hand-engineered features or neural networks to represent MOF structures and predict their properties. Generative modeling, a more recent approach, generates novel MOF structures *de novo*, rather than relying on existing databases. Existing generative models include variational autoencoders, generative adversarial networks, normalizing flows, and autoregressive models. This work employs a diffusion model, specifically DiffLinker, to generate novel MOF linkers. Diffusion models have shown promise in drug discovery and are transferred here to the design of MOFs by generating new linkers while keeping the metal nodes and topology fixed.
Methodology
The GHP-MOFassemble framework consists of three main components: Decompose, Generate, and Screen and Predict. **Decompose:** This step uses a molecular fragmentation algorithm (MMPA) to decompose linkers from high-performing MOFs in the hMOF dataset into their constituent molecular fragments. High-performing MOFs are defined as those with a CO2 capacity exceeding 2 mmol g⁻¹ at 0.1 bar. The selected high-performing MOFs’ linkers are extracted, and duplicates removed. Then the remaining linkers are fragmented to produce a dataset of chemically relevant fragment-connection atom pairs. **Generate:** This component uses the pre-trained diffusion model, DiffLinker, to generate new MOF linkers based on the molecular fragments from the Decompose step. DiffLinker connects these fragments, varying the number of sampled atoms (5-10) to generate diverse linkers. The model outputs 3D coordinates and atomic species of heavy atoms; openbabel adds hydrogen atoms. Dummy atoms are then identified to facilitate assembly with metal nodes. The generated linkers are screened (removing those with S, Br, and I), and their quality is evaluated using five metrics: SAscore, SCscore, validity, uniqueness, and internal diversity. The generated linkers are then assembled with one of three pre-selected nodes (Cu paddlewheel, Zn paddlewheel, Zn tetramer) into MOFs with pcu topology. Different catenation levels are generated by site translation using Pymatgen. **Screen and Predict:** This final component involves a series of screening steps to filter out unsuitable MOFs and predict CO2 capacity. First, inter-atomic distances are checked against values in the OChemDb database. A pre-simulation check verifies chemical validity using UFF4MOF. Next, a modified CGCNN model, trained on the hMOF dataset, predicts CO2 capacity. Finally, molecular dynamics (MD) simulations using LAMMPS validate structural stability, and Grand Canonical Monte Carlo (GCMC) simulations using RASPA calculate CO2 adsorption capacities.
Key Findings
GHP-MOFassemble generated 120,000 MOFs within 33 minutes using multiprocessing on 28 cores. After screening for structural and chemical validity, 18,770 MOFs remained. The AI model predicted 364 high-performing MOFs (CO2 capacity > 2 mmol g⁻¹ at 0.1 bar). MD simulations identified 102 stable MOFs. GCMC simulations confirmed six MOF candidates with CO2 capacities exceeding 2 mmol g⁻¹, representing the top 5% of the hMOF dataset. Analysis revealed that linkers in high-performing hMOF structures have a higher proportion of carboxylic groups, while hydroxyl groups are more frequent in the AI-generated MOFs. The AI-generated MOFs showed a higher proportion of high-performing cat2 and cat3 structures compared to the hMOF dataset, suggesting catenation is beneficial for CO2 adsorption. The framework completed the analysis from assembly to the selection of high-performing MOFs within 12 hours using distributed computing. A benchmark analysis showed that the AI components of the workflow are significantly faster than the more computationally expensive MD and GCMC simulations.
Discussion
The findings demonstrate the efficacy of GHP-MOFassemble in accelerating the discovery of novel high-performing MOFs for CO2 capture. The integration of generative AI with high-throughput screening and detailed simulations enables efficient exploration of the vast chemical space of MOFs. The results highlight the importance of combining AI-driven design with rigorous validation using established simulation methods. The observation of a higher proportion of high-performing catenated structures suggests that future design efforts should consider the role of catenation in enhancing CO2 adsorption. The differences in functional group distributions between AI-generated and hMOF linkers indicate the capacity of the generative AI to discover novel chemical structures with improved properties. The high accuracy of the CGCNN model in classifying high and low performers supports the model's reliability in screening a large number of candidates. The framework’s success in identifying six high-performing MOFs within a reasonable timeframe emphasizes its potential for broader applications in materials discovery.
Conclusion
GHP-MOFassemble provides a powerful framework for accelerating the design and discovery of high-performing MOFs for CO2 capture. The integration of generative AI, high-throughput screening, and advanced simulations allows for efficient exploration of the vast chemical space, leading to the identification of novel MOF structures with improved CO2 adsorption capacities. Future work could explore different topologies, metal nodes, and generative AI models. Further optimization of the workflow and the incorporation of additional predictive models could further enhance the efficiency and accuracy of the framework.
Limitations
The study focuses on MOFs with pcu topology and three specific metal nodes. The generalizability of the findings to other topologies and metal nodes needs further investigation. The accuracy of the CO2 adsorption capacity predictions relies on the accuracy of the trained CGCNN model, which might have limitations in extrapolating to unseen chemical space. The computational cost of the MD and GCMC simulations can be significant, particularly for large MOF structures. Future studies could explore strategies to reduce this cost, such as using more efficient force fields or employing more advanced sampling techniques.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny