Chemistry
Perovskite synthesizability using graph neural networks
G. H. Gu, J. Jang, et al.
Discover a groundbreaking graph neural network model developed by Geun Ho Gu and colleagues that accurately predicts the synthesizability of perovskites. This innovative approach sets a new standard with a remarkable true positive rate, surpassing traditional methods and providing a pathway to identifying new material candidates for diverse applications.
~3 min • Beginner • English
Introduction
The study addresses the challenge of predicting the synthesizability of perovskite materials, a key step for accelerating high-throughput materials discovery. Conventional screening based on thermodynamic stability (e.g., energy above the convex hull) neglects kinetic factors and synthesis conditions, limiting its predictive power for experimental realization. Moreover, synthesizability prediction cannot rely on standard binary classification due to the lack of true negative (unsynthesizable) examples in databases, which contain only previously synthesized (positive) and virtual (unlabeled) structures. Prior positive–unlabeled (PU) learning approaches achieved respectable overall accuracy but underperformed for perovskite subspaces. Given the broad technological relevance of perovskites (photovoltaics, LEDs, magnetism, superconductivity, solid electrolytes), the authors aim to develop a perovskite-focused, generalizable, structure-based model with improved accuracy using domain-specific transfer learning combined with PU learning and graph neural networks.
Literature Review
- Thermodynamics-based metrics: Energy above convex hull is commonly used to assess stability and potential synthesizability but omits synthesis kinetics and processing conditions, limiting reliability.
- PU learning for synthesizability: Prior work applied transductive PU learning with crystal graph convolutions to define a crystal-likeness (CL) score for Materials Project entries, achieving ~87% out-of-sample accuracy overall but only ~74% within perovskites.
- Heuristics and classical ML for perovskites: Goldschmidt tolerance factor and an ML-derived tolerance factor (SISSO) have been used to classify perovskite formability, alongside models using gradient boosting, SVMs, and random forests. These approaches largely target ionic oxide perovskites and rely on Shannon ionic radii, limiting applicability to covalent, halide, hydride, and anti-perovskites.
- Domain-specific and transfer learning: Training on specific material domains can improve accuracy, and transfer learning enables effective model building with smaller domain datasets by leveraging knowledge from larger general datasets.
Methodology
- Data collection and labeling: From Materials Project (MP, Oct 2020), 46,546 crystals with ICSD IDs and experimental tags were labeled as positive (synthesizable), and 79,789 entries without ICSD IDs were treated as unlabeled. Perovskite candidates were then gathered from MP, OQMD, and AFLOW (Oct 2020). Duplicates were removed using pymatgen's StructureMatcher and AFLOW perovskite prototypes, yielding 943 synthesized (positive) and 11,964 virtual (unlabeled) perovskite crystals.
- Learning framework: Inductive positive–unlabeled (PU) learning was used in two stages. First, a base model was pre-trained on the general MP dataset (positives + unlabeled). Next, transfer learning was applied to fine-tune the model on the perovskite-only dataset (domain-specific learning). To ensure unbiased evaluation, 10% of the positive data were held out from both MP and perovskite sets as a test set not seen during pre-training.
- Model architecture: A graph neural network (crystal graph convolution) encodes structures. Graph construction uses Voronoi neighbors within a 7 Å radius. Node (atom) features are element one-hot encodings. Edge features include Gaussian-expanded interatomic distances and Gaussian-expanded solid angles from the Voronoi tessellation. The network comprises edge and node updates via dense/linear layers with softplus activations, followed by a pooling operation (minimum pooling) and sigmoid activation to output a synthesizability (CL) score. Details are provided in the Methods section/figures.
- Evaluation and comparison: Model performance (out-of-sample true positive rate) was compared against non-domain-specific PU models and empirical rule-based methods (SISSO-based tolerance factor and Goldschmidt screening). Post hoc validation included literature mining to check whether high-CL virtual perovskites had been synthesized and whether the lowest-CL virtual entries remained unreported.
- Application screening: Predicted synthesizable candidates were further filtered for applications. For metal halide perovskites, DFT screening (PBEsol relaxation, HSE06 single-point) assessed band gaps, identifying photovoltaics-relevant ranges. For Li-rich anti-perovskites, candidates with high CL scores were compiled with their thermodynamic stability (energy above hull).
Key Findings
- Accuracy: The perovskite-focused transferred model achieved a 0.957 out-of-sample true positive rate, markedly higher than empirical geometric-factor methods (0.806–0.863) and prior non-domain-specific PU models (~74% for perovskites).
- Synthesizable predictions: Out of 11,964 virtual perovskites, 962 were predicted to be synthesizable. Literature validation found 179 of these had indeed been synthesized, while none of the 1,000 lowest-CL virtual crystals were reported, supporting predictive value.
- Generality across perovskite classes: The model successfully predicts synthesizability for classical ionic, covalent, hydride, halide, and anti-perovskites, addressing limitations of ionic radii-based heuristics.
- Comparative performance: The non-domain-specific MP-trained model predicted only 101 of the 179 literature-synthesized virtual perovskites as stable, underscoring the benefit of domain-specific transfer learning.
- Anti-perovskites: High CL scores were observed broadly for anti-perovskites with anions (C, N, O, P) on the B site and transition metals on the C site; 327 virtual anti-perovskites were predicted synthesizable.
- Metal halide candidates: 98 virtual metal halide perovskites were predicted synthesizable; 43 had computed band gaps, and 12 candidates exhibited band gaps between 0.7–2.0 eV (PV-relevant). Of these 12, 8 were thermodynamically stable (energy above hull < 0.1 eV/atom).
- Li-rich anti-perovskites: Eight Li-rich anti-perovskite candidates with CL > 0.5 were identified. Previously reported Li3OBr and Li3OCl are thermodynamically stable (0.012 and 0.006 eV/atom, respectively), while newly predicted ones tend to have lower thermodynamic stability (>0.3 eV/atom).
Discussion
The work demonstrates that a domain-specific, transfer-learned PU graph neural network can robustly assess synthesizability of perovskites, addressing the unique challenge of lacking negative labels in materials databases. By focusing on perovskites and leveraging structural representations beyond ionic-radius heuristics, the model substantially improves true positive recognition and extends applicability to covalent, halide, hydride, and anti-perovskite chemistries. Validation through literature mining strengthens confidence that high-CL predictions reflect realistic syntheses, while the absence of reports among the lowest-CL cases supports the model’s discriminative utility. The results also show that synthesizability as captured by CL scores complements thermodynamic metrics: some high-CL candidates possess higher energy above hull, suggesting kinetic accessibility or non-equilibrium syntheses might be relevant. Combining CL-based assessments with thermodynamic stability can yield more reliable discovery pipelines. The model’s outputs highlight underexplored composition spaces (e.g., anti-perovskites) and identify candidates for photovoltaics and solid-state electrolytes, providing actionable targets for experimental validation.
Conclusion
The study introduces a perovskite-focused synthesizability predictor using graph neural networks with positive–unlabeled and transfer learning, achieving a 0.957 out-of-sample true positive rate and outperforming empirical heuristics. It generalizes across diverse perovskite types and identifies 962 potentially synthesizable virtual perovskites, with 179 validated by literature. The approach uncovers rich opportunities in anti-perovskites and proposes candidate metal halides with PV-suitable band gaps and Li-rich anti-perovskites for solid electrolytes. Future work should integrate CL-based synthesizability with thermodynamic and kinetic modeling to enhance precision, expand to other crystal families via domain-specific transfer learning, and support experimental efforts to validate and refine the predictive framework.
Limitations
- Lack of explicit negative data (unsynthesizable examples) necessitates PU learning, which can exhibit lower precision and yield false positives.
- Some high-CL candidates show high energy above the convex hull (>0.3–1 eV/atom), indicating possible thermodynamic instability despite predicted synthesizability.
- The perovskite subset size, while curated, remains smaller than general databases, potentially limiting coverage of rare chemistries and affecting generalization at the margins.
- The model does not explicitly account for synthesis pathways, kinetics, or processing conditions; CL scores capture structural-likeness rather than full synthesis feasibility.
- Reported performance emphasizes true positive rate; comprehensive precision–recall or calibrated probability assessments are not detailed in the provided text.
Related Publications
Explore these studies to deepen your understanding of the subject.

