logo
ResearchBunny Logo
Deep Generative Modeling of Two-Dimensional Crystals

Engineering and Technology

Deep Generative Modeling of Two-Dimensional Crystals

P. Lyngby and K. Thygesen

This research by P. Lyngby and K.S. Thygesen delves into the innovative use of a Crystal Diffusion Variational Autoencoder (CDVAE) paired with a lattice decoration protocol (LDP) to create new two-dimensional crystals. The findings showcase how CDVAE excels in generating a wider array of stable 2D materials compared to LDP.

00:00
00:00
~3 min • Beginner • English
Introduction
The study explores data-driven generation of novel two-dimensional (2D) materials by combining a deep generative model—Crystal Diffusion Variational Autoencoder (CDVAE)—with a lattice decoration protocol (LDP). Using a common seed set of 2615 2D materials, the work aims to: (i) generate large numbers of candidate 2D crystal structures, (ii) relax them with density functional theory (DFT), and (iii) evaluate their thermodynamic stability and structural diversity. The central questions are whether CDVAE can learn stability trends from training data and whether it can produce chemically and structurally diverse, potentially synthesizable 2D materials beyond those attainable by lattice decoration.
Literature Review
The work builds on high-throughput DFT screening frameworks and databases (e.g., OQMD, Materials Project, C2DB) and a growing literature on inverse design using generative models for inorganic crystals (e.g., VAEs, GANs, diffusion-based approaches). Prior efforts have focused on bulk crystals and targeted lattices or properties. CDVAE was introduced for periodic 3D crystals; adapting it to 2D requires handling non-periodicity in one lattice direction. Previous studies indicate that energy above the convex hull (ΔH_hull) is an imperfect but widely used proxy for synthesizability, with examples of synthesized phases above 0.1 eV/atom, underscoring that kinetics and substrates also matter for 2D materials.
Methodology
Workflow and data generation: Starting from 2615 seed 2D materials (also used to train CDVAE), two sets of new structures are created: (1) by CDVAE sampling, and (2) by lattice decoration (LDP) using the same seed structures. Within each set, duplicates are removed. Unique structures are then relaxed via DFT, followed by a second duplicate removal and elimination of non-2D structures (dimensionality analysis per ref. 31). Thermodynamic metrics—the heat of formation (ΔH) and energy above the convex hull (ΔH_hull)—are computed. DFT details: All DFT calculations use GPAW with the PBE exchange–correlation functional, a plane-wave cutoff of 800 eV, and a k-point density of at least 4 Å. Geometry relaxation stops when the maximum force is below 0.01 eV/Å and the maximum stress is below 0.002 eV/Å^3. Calculations are spin-polarized. Duplicate detection: Duplicates are identified via root-mean-square distance (RMSD) using pymatgen; structures with RMSD < 0.3 Å are treated as duplicates and only the lowest-ΔH structure is kept. For initial LDP structures prior to DFT, a coarse duplicate rule (same reduced formula and space group) is used. Convex hull reference: The hull is constructed using C2DB plus a reference set of 9590 elemental, binary, and ternary crystals within 20 meV/atom of the OQMD convex hull; those reference structures were originally relaxed with VASP (PBE; PBE+U for selected oxides) and their total energies were re-evaluated with GPAW (no re-optimization) to ensure consistency. Adapting CDVAE to 2D: CDVAE expects 3D periodic crystals. For 2D materials, an artificial periodicity is introduced along the non-periodic direction by using a lattice vector much larger than the in-plane vectors, ensuring that the model’s graph connectivity remains within the 2D layer. Train/val/test split is 70%/15%/15% of the training set; hyperparameters follow Xie et al. (ref. 39).
Key Findings
- Scale of generated/relaxed data: After initial duplicate removal, 14,192 unique 2D crystals (excluding seeds) were obtained and relaxed with DFT. Across methods, only 22 crystals were common among the 11,630 structures generated in total, indicating strong complementarity between CDVAE and LDP. - DFT relaxation performance (Table 1): LDP vs CDVAE success rate: 82% vs 69%; average relaxation steps: 40.1 vs 55.5; average energy decrease: 0.62 vs 0.51 eV/atom. The similar step counts and energy reductions suggest CDVAE structures are comparably close to their relaxed minima as LDP structures. - SCF convergence and magnetism: DFT relaxation failures occur for ~18% (LDP) and ~31% (CDVAE), largely due to SCF convergence issues. Materials containing magnetic 3d elements (V, Cr, Mn, Fe, Co, Ni) fail in ~30% of cases versus ~10% for non-magnetic compositions. CDVAE generates a higher fraction containing magnetic 3d elements (38%) than LDP (30%), consistent with the lower success rate. - Thermodynamic stability: ΔH and ΔH_hull distributions for CDVAE- and LDP-generated structures are remarkably similar. 73.8% (CDVAE) and 74.0% (LDP) have ΔH_hull < 0.3 eV/atom (matching the training-set stability threshold). Many predicted materials are near the hull: 2004 have ΔH_hull < 50 meV/atom and 3400 have ΔH_hull < 100 meV/atom, suggesting likely synthesizability in many cases. Stability tends to decrease with increased number of unique elements per structure. - Bias/control via training data: A CDVAE trained on unstable materials (ΔH_hull > 0.4 eV/atom) generated structures that relax further from the hull than a model trained on stable materials, demonstrating that CDVAE learns stability-related chemistry from its training set. - Structural and chemical diversity: CDVAE substantially expands stoichiometries and prototypes beyond both the seeds and LDP. Unique stoichiometries: 239 (CDVAE) vs 103 (LDP) vs 87 (seed). New combinations of space group and occupied Wyckoff positions: 130 combinations with 357 materials (CDVAE) vs 76 combinations with 339 materials (LDP). CDVAE tends to produce more complex, lower-symmetry structures; average number of distinct elements per unit cell is 4.0 (CDVAE) vs 2.6 (seed). t-SNE embeddings show CDVAE samples are more dispersed, with notable new clusters such as ABC2D2, space group 25, Wyckoff a,b,c,d—123 materials discovered, 30 within 50 meV/atom of the hull, absent from both the seed set and LDP results.
Discussion
The results show that CDVAE can both learn stability trends present in the training data and produce diverse crystal structures beyond simple element substitutions, thereby addressing the inverse-design challenge for 2D materials. Despite a lower DFT relaxation success rate—linked to increased magnetic-element content and SCF convergence challenges—CDVAE’s relaxed structures exhibit stability distributions comparable to LDP. The capacity to generate structures with novel stoichiometries, space groups, and Wyckoff occupation patterns, including entire new stable families (e.g., ABC2D2-25-abcd), demonstrates genuine exploration beyond the seed design space. Leveraging ΔH_hull as a proxy for synthesizability suggests that a significant subset of the generated materials, particularly those within 50–100 meV/atom of the hull, are promising experimental targets. Together, CDVAE and LDP are complementary: LDP efficiently explores the known blueprint space, while CDVAE introduces new structural “genes,” expanding discovery beyond lattice decoration.
Conclusion
The study generated a large set of relaxed 2D materials from a common seed set using both CDVAE and LDP, identifying more than 8500 unique crystals within 0.3 eV/atom of the convex hull and over 2000 within 50 meV/atom, potentially doubling the number of known stable 2D materials. CDVAE performs excellently with respect to (i) learning stability characteristics from training data and (ii) generating chemically and structurally diverse crystals, including many new prototypes with low symmetry and complex compositions. Given its ability to go beyond substitution-based design, deep generative modeling appears highly promising for autonomous 2D materials discovery. Future work could improve SCF robustness (particularly for magnetic systems), incorporate kinetic and substrate effects into stability/synthesizability assessments, tailor generative priors to better reflect latent structure distributions, and couple generative models with property predictors to target functional applications.
Limitations
- DFT relaxation failures due to SCF convergence—especially for magnetic 3d-element-containing materials—limit yield and may bias statistics toward easier-to-converge systems. - ΔH_hull is a thermodynamic proxy that neglects synthesis kinetics and substrate effects, both of which can be crucial for 2D materials’ experimental realization. - CDVAE’s sampling from a standard Gaussian may not fully match the latent distribution, potentially leading to overly complex compositions and low-symmetry structures (out-of-distribution sampling). - Adaptation to 2D via artificial periodicity could introduce subtle modeling biases compared to explicitly non-periodic treatments. - Lower CDVAE relaxation success rate than LDP can influence comparative stability statistics.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny