Introduction
Material discovery has historically progressed through three paradigms: experimental trial-and-error, theoretical modeling based on experimental results, and computational simulations of theoretical models. The emergence of large datasets generated from these paradigms, coupled with advanced algorithms, has ushered in a fourth paradigm: data-driven discovery using artificial intelligence (AI). AI and virtual screening methods are increasingly applied across materials science to identify novel compounds with desired functionalities. Two-dimensional (2D) materials, possessing unique and tunable properties, show great promise in various applications. However, traditional methods for discovering new 2D materials, reliant on computationally intensive density functional theory (DFT) calculations, have limited the exploration of compositional space. This study presents a recipe leveraging AI to overcome these limitations, enabling the efficient virtual screening of a vastly expanded chemical space of 2D materials. The authors aim to generate a comprehensive database of potentially stable 2D materials and predict their key properties relevant to energy applications, accelerating the discovery of functional 2D materials.
Literature Review
The authors review the history of materials discovery, highlighting the shift towards data-driven approaches using AI and machine learning (ML). They discuss the successful application of these methods in various subfields of chemistry and materials science. The review also emphasizes the challenges inherent in applying AI to diverse material subfields, including the need for high-quality and high-fidelity data from experiments or simulations. The authors specifically focus on the growing interest in 2D materials, noting their potential and the limitations of existing methods that rely heavily on DFT calculations, which are resource-intensive and restrict the exploration of large chemical spaces. The emergence of in silico 2D materials repositories, generated through exfoliation of 3D bulk structures and combinatorial atomic exchange in 2D structures, is highlighted as a valuable source of data for AI-driven approaches. The authors cite relevant studies demonstrating the use of AI for materials discovery and virtual screening in the context of 2D materials and other material classes, emphasizing the novelty and potential impact of their proposed data-driven method for discovering new functional 2D materials.
Methodology
The authors' methodology involves three key steps (illustrated in Fig. 1):
**1. Generation of new 2D materials:** Starting with 22 different 2D crystal prototypes and 52 chemical elements, a brute-force approach generates a vast library of over 72 million 2D compounds. Prototypes are selected from training data (Fig. 2a), and elements are grouped based on assumed charge states (Fig. 2b). Elemental substitutions are systematically performed to create new 2D material candidates. The resulting chemical screening space is visualized in a heatmap (Fig. 3).
**2. Filtering:** Three sequential filters are applied to reduce the initial set of candidates to a more manageable size while maintaining the diversity of the dataset. These filters are: (a) a symmetry filter removing geometrically duplicate compounds; (b) a neutrality filter discarding compounds with nonzero net charge; and (c) a stability filter eliminating compounds predicted to be unstable (using ML models trained on data from the Computational 2D Materials Database (C2DB)). This filtering process resulted in 316,505 validated 2D material candidates (Fig. 4 and Table 1). The number of materials remaining after each filtering step is detailed in Fig. 4 and Table 1.
**3. Property prediction:** Artificial neural networks (ANNs) are trained using data from the C2DB (2226 2D materials) to predict key properties relevant to energy conversion and storage. These properties include stability, heat of formation, energy above the convex hull, band gap, valence band maximum (VBM), conduction band minimum (CBM), work function, and magnetic state. Only basic, computationally inexpensive chemical features are used as input to the ANN models. The performance of the ANN models is evaluated using 20-fold cross-validation (Fig. 5). The predicted properties are then used for virtual screening to identify promising candidates for various energy applications.
The authors also address the limitations of the PBE functional used in DFT calculations within C2DB for predicting electronic properties. They perform a regression study between PBE and G0W0 results on 188 2D materials from C2DB, deriving equations to rescale PBE predictions to G0W0 values (Eqs. 1-3). These rescaled values are included in the V2DB. The choice of prototypes, elements, and stability thresholds are discussed as key parameters affecting the model's robustness.
Key Findings
The study successfully generated a Virtual 2D Materials Database (V2DB) comprising 316,505 likely stable 2D materials. The V2DB incorporates predicted physicochemical, electronic, and magnetic properties relevant to energy applications. The authors demonstrate the effectiveness of their AI-aided virtual screening approach by identifying thousands of promising candidates for photovoltaics and photocatalysis applications. The predicted properties, including band gap, VBM, and CBM, are utilized to identify suitable materials for specific energy applications (Fig. 6). Materials with band gaps between 0.75 and 1.75 eV are identified as potential candidates for single-junction photovoltaic cells. The authors also provide criteria based on band gap, VBM, and CBM for identifying efficient 2D materials for photocatalytic water, CO2, and N2 conversion. A regression analysis comparing the predicted band gap, VBM, and CBM values from the ML models with those from G0W0 calculations (more accurate but computationally expensive method) demonstrates a high degree of correlation (Fig. 7). The validation of the ML models against the 2Dmatpedia database, an independent dataset, further supports the reliability of the predicted properties. The study highlights that basic elemental and structural information is sufficient for predicting the stability and key properties of 2D materials, demonstrating the effectiveness of their data-driven approach. The heatmap of the chemical screening space (Fig. 3) aids in assessing the reliability of predictions across the entire compound space.
Discussion
The study demonstrates the potential of AI-driven virtual screening for accelerating the discovery of novel functional 2D materials. The creation of the V2DB, containing a vast number of predicted stable materials, represents a significant advancement in materials science. The successful identification of promising candidates for photovoltaic and photocatalytic applications validates the efficacy of the proposed methodology. The reliance on basic chemical features as input to the ML models showcases the potential for scalability and applicability to future datasets. The comparison with independent datasets reinforces the robustness of the developed models. The limitations of the PBE functional and the subsequent rescaling to G0W0 values highlight a strategy for improving the accuracy of the predictions. This research contributes to a broader understanding of the potential of AI for high-throughput materials discovery and lays a foundation for future studies exploring more complex material properties and larger chemical spaces.
Conclusion
This work successfully demonstrates the utility of AI-aided virtual screening for discovering new 2D materials for energy applications. The creation of the V2DB, freely accessible to the research community, significantly expands the landscape of potential 2D materials. The methodology presented is a robust and scalable approach that can be adapted for future virtual screening efforts, particularly as the availability of high-quality data on 2D materials increases. Future work could focus on exploring different AI algorithms, expanding the range of predicted properties, incorporating more accurate DFT functionals, and further refining the stability criteria to improve prediction accuracy and identify materials suitable for diverse applications beyond energy conversion and storage.
Limitations
The accuracy of the predicted properties depends on the quality of the training data, which primarily uses DFT calculations with the PBE functional. While a rescaling method was applied to improve accuracy, inherent limitations of the PBE functional remain. The generalizability of the models may be limited by the selection of prototypes and elements used in the initial generation step and the representation of these in the training data. The stability predictions indicate thermodynamic stability but do not guarantee the experimental synthesizability of all predicted materials. Further experimental validation is necessary to confirm the actual properties and synthesizability of these materials. Additionally, the assessment of toxicity and the abundance of the constituent elements should be considered for practical applications.
Related Publications
Explore these studies to deepen your understanding of the subject.