logo
ResearchBunny Logo
Introduction
The discovery of new materials is crucial for technological advancement. Traditional experimental methods are often time-consuming and expensive. Computational approaches, including high-throughput screening and machine learning techniques, offer promising alternatives to accelerate this process. Two-dimensional (2D) materials, with their unique properties, have attracted significant attention, but their discovery remains challenging. This research investigates the potential of deep generative models to efficiently generate and explore the vast landscape of 2D materials. The study focuses on comparing a Crystal Diffusion Variational Autoencoder (CDVAE), a deep generative model trained on known 2D materials, against a lattice decoration protocol (LDP), a more conventional method for generating new structures. The goal is to assess the effectiveness of CDVAE in generating diverse and thermodynamically stable 2D crystals, comparing its performance with the established LDP. The importance of this study lies in evaluating the capacity of a powerful machine learning approach to significantly enhance the efficiency and scope of 2D materials discovery, potentially leading to the identification of materials with novel and desirable properties for various applications.
Literature Review
High-throughput computational screening has been instrumental in the discovery of novel materials, particularly in areas like thermoelectrics, photovoltaics, and battery materials. Methods such as automated searches and dynamic workflow systems have been developed to manage the computational burden. Databases like the Open Quantum Materials Database (OQMD) and the Computational 2D Materials Database (C2DB) provide valuable resources for materials discovery. Recent efforts have explored the use of machine learning, specifically deep generative models, to predict and design new materials. Generative adversarial networks (GANs) and variational autoencoders (VAEs) have been applied to predict crystal structures, demonstrating success in generating diverse and stable materials. However, the application of deep generative models to the specific challenge of discovering novel 2D materials remains relatively unexplored. This study builds upon this existing body of work by applying a cutting-edge deep generative model, CDVAE, to the generation of 2D materials and directly comparing its performance with the established lattice decoration method. This allows for a quantitative assessment of the effectiveness of CDVAE in materials discovery.
Methodology
The workflow started with a set of 2615 known 2D materials used to train the CDVAE model and as seed structures for the LDP. Two new sets of crystal structures were generated using CDVAE and LDP, respectively. Duplicate structures were removed using the root mean square distance (RMSD) calculated with pymatgen. Structures with RMSD < 0.3 Å were considered duplicates, keeping only the structure with the lowest heat of formation. For initial LDP structures, a more simplified duplicate removal based on reduced formula and space group was used. Density Functional Theory (DFT) calculations employing the PBE exchange-correlation functional in GPAW code were performed to relax the unique crystal structures. A plane wave cut-off energy of 800 eV and a k-point density of at least 4 Å were used. Relaxation stopped when maximum force was below 0.01 eV/Å and maximum stress was below 0.002 eV/ų. Materials that relaxed into non-2D structures or failed DFT convergence were discarded. Thermodynamic stability was assessed by calculating the heat of formation (ΔH) and the energy above the convex hull (ΔHhull). To determine the convex hull, reference databases including C2DB and a subset of OQMD were used. The CDVAE was trained using 70% of the initial 2D material dataset, with 15% for validation and 15% for testing. The same hyperparameters as employed by Xie et al. for their MP-20 dataset were used. In order to generate 2D materials using CDVAE which was designed for 3D materials, an artificial periodicity in the non-periodic direction was introduced by using a lattice vector an order of magnitude larger than those in the periodic direction. This ensured that the graph networks in CDVAE only connected atoms within the 2D layer and therefore learned to generate 2D materials. The structural diversity was analyzed using histograms of stoichiometry, space group number, and occupied Wyckoff positions. t-SNE embedding was used to visualize the structural distribution of the generated materials compared to the seed structures.
Key Findings
The DFT relaxation success rates were 82% for LDP and 69% for CDVAE. CDVAE structures were, on average, closer in energy to their relaxed counterparts (average energy decrease of 0.51 eV/atom) compared to LDP (0.62 eV/atom), suggesting comparable proximity to relaxed structures despite requiring more relaxation steps (55.5 vs 40.1). The failure rate was attributed to magnetic ground states and convergence issues in the Kohn-Sham SCF cycle, particularly for materials containing 3d transition metals. The thermodynamic stability, as reflected in the distribution of ΔHhull, was remarkably similar for both methods. About 74% of materials from both methods had ΔHhull below 0.3 eV/atom. CDVAE generated structures with a significantly larger number of unique elements (maximum of 5 in seed structures, exceeding this in generated structures) than LDP, which was limited to stoichiometries in the seed materials. The thermodynamic stability decreased with an increasing number of unique elements. A CDVAE model trained on unstable materials (ΔHhull > 0.4 eV/atom) produced structures significantly further from the convex hull, confirming the model's ability to learn stability properties. CDVAE produced a much higher occurrence of oxygen, chalcogens, and halogens, possibly indicating some overfitting. CDVAE generated 239 unique stoichiometries compared to 87 in the seed structures and 103 in LDP-generated structures, illustrating its ability to create new compositions. The number of unique combinations of space group and occupied Wyckoff positions was also substantially higher for CDVAE (130 new combinations in 357 materials) than LDP (76 new combinations in 339 materials). t-SNE embedding showed CDVAE-generated structures were more spread out than the seed structures and LDP-generated materials, reflecting greater diversity. A noteworthy example was a cluster of CDVAE-generated materials with stoichiometry ABC2D2, space group number 25, and Wyckoff positions a, b, c, d, representing a new class of materials (123 new materials, 30 within 50 meV of the convex hull). Overall, the study generated over 8500 unique 2D crystals with ΔH within 0.3 eV/atom of the convex hull, more than 2000 with ΔH within 50 meV/atom, representing a significant expansion of the known stable 2D materials.
Discussion
The results demonstrate that CDVAE is a powerful tool for 2D materials discovery. Its ability to generate diverse structures with comparable stability to those produced by LDP, a well-established method, is significant. The enhanced diversity of materials generated by CDVAE, particularly the discovery of new stoichiometries and space group/Wyckoff position combinations, shows its capacity to go beyond the limitations of traditional methods. The observation that CDVAE successfully learns the stability properties of the training data, as evidenced by the model trained on unstable structures producing less stable materials, highlights the reliability of the model. The finding that CDVAE tends to generate complex, low-symmetry structures could be attributed to the non-Gaussian distribution of the underlying structure of materials and sampling from out-of-distribution points in latent space. While this may result in some overfitting toward particular elements, it is a key advantage of the CDVAE as it allows the generation of new material classes absent in the training data. The fact that only 25% of the generated materials had ΔHhull above the 0.3 eV/atom threshold demonstrates that CDVAE produces materials with high thermodynamic stability comparable to LDP while also offering substantial structural novelty. This method greatly expands the known space of 2D materials, paving the way for future investigations.
Conclusion
This study successfully employed a deep generative model (CDVAE) in combination with LDP to generate over 8500 unique 2D crystals with high thermodynamic stability. CDVAE demonstrated superior diversity compared to LDP, generating new stoichiometries and structural combinations. The model's ability to learn stability properties and produce diverse materials highlights the promise of deep generative models in materials discovery. Future work could explore refining the model to address potential overfitting and further investigate the synthesizability of the generated materials. The generated materials and their properties are publicly available through C2DB.
Limitations
The study's primary limitation is the reliance on DFT calculations, which have inherent limitations in accuracy, particularly for complex systems or those with strong correlation effects. The potential overfitting of CDVAE towards specific elements, like oxygen and chalcogens, suggests a need for further model refinement to enhance broader chemical space exploration. The success rate of DFT relaxation varied between methods, and some convergence problems may have biased the results, although the impact was analyzed and acknowledged. The assessment of thermodynamic stability is based solely on energy calculations and does not account for kinetic factors or substrate interactions that influence material synthesis. Further experimental validation is needed to confirm the synthesizability of the predicted materials.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny