Engineering and Technology
Predicting Synthesizability of Crystalline Materials via Deep Learning
A. Davariashtiyani, Z. Kadkhodaie, et al.
Discover how a deep-learning model leverages three-dimensional images of crystal structures to predict the synthesizability of hypothetical crystals. This groundbreaking research, conducted by Ali Davariashtiyani, Zahra Kadkhodaie, and Sara Kadkhodaei, showcases an innovative approach to identifying viable materials for battery electrodes and thermoelectric applications.
~3 min • Beginner • English
Introduction
The study addresses the core question of whether hypothetical crystalline materials, across diverse structures and compositions, are likely to be synthesizable. Traditional synthesis planning relies on expert knowledge of processing conditions, thermodynamics, kinetics, and scale, making general predictive metrics difficult. Prior energy-based benchmarks (e.g., energy above ground state or amorphous limits) are limited in scope and fail to reliably identify synthesizability across compositions, often missing low-energy unsynthesizable and high-energy synthesizable phases. The purpose of this work is to develop a general, accurate deep-learning framework that captures both structural and chemical features of crystals to predict synthesizability, thereby accelerating materials discovery and reducing trial-and-error.
Literature Review
Earlier work used thermodynamic energy metrics to assess synthesizability. Sun et al. showed many low-energy hypothetical crystals remain unobserved, challenging energy-above-ground-state as a reliable metric. Aykol et al. proposed the amorphous solid enthalpy as an upper bound for synthesizable crystals (stability skyline), which can flag high-energy anomalies but is composition-specific and cannot capture low-energy anomalies or high-energy synthesizable phases (e.g., high-pressure). Machine learning efforts include: Hautier et al. probabilistic ion substitution models; Ryan et al. neural networks with atomic fingerprints predicting site substitution likelihood; Aykol et al. network models incorporating discovery timelines and circumstantial factors to predict experimental success; and studies using expert-knowledge-based parameters or literature text mining to infer synthesis conditions (Kim et al., Raccuglia et al., Tang et al.). These methods are often constrained to specific compositions or structure types. The present study advances by learning latent structural and chemical patterns directly from 3D crystal representations to generalize across structures and chemistries.
Methodology
Data collection and labeling: Synthesizable crystals (positive class) were sourced from the Crystallographic Open Database (COD, 2019). A total of 3000 samples were selected, including all distinct polymorphs for 108 most-studied compositions used in anomaly generation (367 samples), plus 2633 samples from other compositions. Crystal anomalies (negative class) were generated for compositions most repeated in the materials science literature (top 0.1% by frequency using Tshitoyan et al.'s NLP model, 1922–2018). For the top 108 compositions (≥3306 mentions), hypothetical structures were generated using the Crystal Structure Prototype Database (CSPD) toolkit, and any structures already present in COD were excluded. For each composition, at most the number of COD polymorphs were generated (minimum five), yielding 600 anomaly samples. The dataset was split into training (49%), validation (21%), and test (30%). Due to class imbalance, negative samples in the training set were randomly duplicated to balance classes.
Crystal representation: Each CIF was parsed with ASE and converted into a 3D voxel image. The unit cell was replicated to fill a 70 Å cube, then digitized into 128×128×128 voxels with three channels per voxel: normalized atomic number, periodic table row, and group; empty voxels are zeros. Lanthanides/actinides used group 3.5. Channels were normalized by dividing by maxima (Z/118, row/7, group/(18+1)). To avoid multiple atoms per voxel, crystals with nearest-neighbor distance <0.947 Å were excluded (primarily hydrogen issues). Structures with partial occupancies were converted to ordered supercells using the Supercell program.
Models:
- Supervised CNN classifier: A convolutional encoder (three layers, 3×3×3 filters, ReLU activations, each followed by 4×4×4 max-pooling) learns a latent representation jointly with a connected MLP classifier (three fully connected hidden layers with 13 nodes each). Training used Adam to minimize binary cross-entropy on labels. Total trainable parameters: ~61,703. Decision threshold: 0.5.
- Unsupervised CAE + MLP: A convolutional autoencoder (encoder: three conv layers with ReLU, filters 3×3×3; pooling 4×4×4 for first two layers, 2×2×2 for third; outputs 32, 32, and 64 channels; decoder mirrors with upsampling; final 3-channel sigmoid output) learns latent representations by minimizing per-voxel binary cross-entropy reconstruction loss via Adam. Dropout (30%) after each pooling/upsampling regularizes and reduces overfitting. Trainable parameters: ~281,923. The flattened latent vector feeds a separate MLP classifier (same architecture as above), trained on labeled data.
Baselines and ablations: A raw-image MLP baseline that directly consumes flattened voxels (no convolutional feature learning) was trained, having >80 million parameters; it overfit and underperformed, highlighting the necessity of learned latent features.
Evaluation: Primary metric is ROC-AUC on the held-out test set (1080 images). Accuracy, sensitivity (recall on positive class), and specificity were reported at threshold 0.5. Generalization was assessed on out-of-distribution sets: 2088 candidate battery electrode crystals (Materials Project Battery Explorer; 264 present in COD) and 122 thermoelectric candidate crystal structures (56 COD, 66 MP) derived from literature-based composition suggestions by Tshitoyan et al. Predictions were also compared with the energy-based stability skyline model to assess ability to identify low-energy anomalies and high-energy synthesizable crystals. A MoS2 polymorph case study demonstrated composition-specific structure ranking.
Key Findings
- Test set performance: CNN classifier ROC-AUC 0.981, accuracy 93.7%; CAE+MLP ROC-AUC 0.968, accuracy 91.9% (threshold 0.5). From Fig. 3e, sensitivities were ~0.962 (CNN) and ~0.943 (CAE+MLP), while specificities were ~0.932 (CNN) and ~0.799 (CAE+MLP), indicating higher specificity for the supervised CNN.
- Importance of feature learning: Raw-image MLP baseline achieved ROC-AUC 0.685 and 80% accuracy, worse than a trivial classifier labeling all samples as synthesizable (83%), confirming the necessity of convolutional feature learning.
- Electrode materials (2088 samples; 264 COD): Recall on COD subset was 82% (CNN) and 89% (CAE+MLP). Among 1824 non-COD candidates, 73% (CNN) and 85% (CAE+MLP) were predicted synthesizable. Synthesizability likelihood correlated plots with volumetric capacity and average voltage enable Ashby-style selection.
- Thermoelectric materials (122 samples; 56 COD, 66 MP): Recall on COD subset was 64.3% (CNN) and 78.6% (CAE+MLP). Among 66 MP (non-COD) candidates, predicted synthesizable fractions were 68.2% (CNN) and 89.4% (CAE+MLP). Selected top predictions included Mo3Te4 (CAE+MLP 1.00), SbTm polymorphs, P2SnZn, and CuTe/Cu2Se2Tl.
- MoS2 case study: The hexagonal P63/mmc phase (2H) received synthesizability likelihood 0.82 (CAE+MLP), consistent with known stability and Pauling’s rules (Mo4+ CN=4; S2− CN=2). Several tetragonal polymorphs were also predicted synthesizable, aligning with reported metastable tetragonal phases.
- Comparison with stability skyline: The model correctly identified both low-energy anomalies and high-energy synthesizable crystals across 815 materials analyzed previously, overcoming the composition-specific energy limit by learning complex structural/chemical patterns.
Discussion
The proposed framework jointly captures global structural and chemical information through 3D voxel images and deep feature learning, enabling accurate and general synthesizability predictions across diverse crystal structures and compositions. Supervised learning yields high specificity within-distribution, while unsupervised feature learning (CAE+MLP) shows stronger generalization on out-of-distribution sets (electrodes and thermoelectrics), underscoring the value of unlabeled representation learning. Compared to energy-threshold models (stability skyline), this approach identifies exceptions such as low-energy unsynthesizable and high-energy synthesizable phases, recognizing that synthesizability depends on more than thermodynamics (e.g., kinetics, routes). Relative to atomic-fingerprint methods limited to local topology and specific structures, the voxel representation and convolutional encoders leverage long-range translational symmetries and richer global patterns. Although the learned features are not directly interpretable, they effectively map latent structural/chemical patterns to synthesizability likelihood, enabling materials screening and multi-objective selection (e.g., with voltage or capacity).
Conclusion
This work introduces a deep-learning framework that represents crystals as 3D chemically encoded images and learns latent features via CNNs and CAEs to predict synthesizability across structures and compositions. The models achieve high test ROC-AUC (0.981 CNN; 0.968 CAE+MLP) and perform well on external datasets (battery electrodes and thermoelectrics), with the unsupervised CAE+MLP showing superior generalization. The approach surpasses energy-only metrics by capturing complex structural and chemical patterns, enabling identification of both low-energy anomalies and high-energy synthesizable phases. Future directions include enhancing interpretability via additive feature attribution (e.g., layer-wise relevance propagation/SHAP), expanding and refining anomaly labels to reduce sample bias, incorporating process-related metadata, increasing image resolution or alternative graph/point-cloud representations, and integrating the model into active discovery workflows for targeted synthesis.
Limitations
- Labeling of crystal anomalies relies on unobserved structures for the most-studied compositions; expanding beyond a few hundred compositions increases risk of mislabeling synthesizable (positive) cases as anomalies (negative), potentially harming predictive power.
- Composition coverage and potential sample bias arise from selecting anomalies only among top 0.1% most-studied compositions.
- Image discretization imposes a nearest-neighbor cutoff (0.947 Å) and 128^3 resolution, excluding some hydrogen-rich or very dense structures; higher resolution increases computational cost.
- The models are black boxes; limited interpretability of learned features complicates physical insight.
- Generalization drop on external sets (lower recall for thermoelectrics and electrodes vs test) reflects distribution shift and smaller sample sizes.
- Training depends on COD (2019) data quality and completeness; partial occupancies require supercell approximations that may introduce artifacts.
- Class imbalance requires duplication of negative samples in training, which may influence decision boundaries.
Related Publications
Explore these studies to deepen your understanding of the subject.

