logo
ResearchBunny Logo
Machine learning-enabled chemical space exploration of all-inorganic perovskites for photovoltaics

Chemistry

Machine learning-enabled chemical space exploration of all-inorganic perovskites for photovoltaics

J. Kim, J. Noh, et al.

Discover a groundbreaking framework for designing B-site-alloyed ABX3 metal halide perovskites using advanced DFT and ML techniques. Researchers Jin-Soo Kim, Juhwan Noh, and Jino Im identify 10 promising compounds, including CsGe0.3125Sn0.6875I3, optimized for next-generation solar cells.

00:00
00:00
~3 min • Beginner • English
Introduction
Metal halide perovskites (ABX3) exhibit excellent optoelectronic properties and are promising for photovoltaics and other devices, but face challenges including Pb toxicity and poor stability, particularly associated with organic A-site cations. Reducing Pb content and improving stability without sacrificing performance is a key goal. Substitutional alloying at A-, B-, and X-sites has emerged as a route to tune stability and properties; high-entropy mixing can improve thermodynamic stability via configurational entropy. However, the combinatorial chemical and configurational spaces are enormous, and experimental exploration is impractical. Prior computational studies using DFT and DFT+ML have screened mixed perovskites, but typically sample random configurations (e.g., SQS) and do not guarantee identification of the most stable atomic configuration at each composition. This work addresses that gap by proposing a DFT/ML framework that explicitly explores all possible B-site atomic configurations for each composition, to identify stable all-inorganic ABX3 perovskites with favorable, near-ideal photovoltaic bandgaps.
Literature Review
Previous works used DFT or hybrid DFT to study mixed perovskites (e.g., SQS-based modeling of entropy effects in double perovskites; DFT datasets of ABX3 alloys) and ML approaches such as crystal site feature embedding (CSFE) for bandgap predictions and neural-network screening based on elemental descriptors. These studies identified promising alloys, including low-percentage dopants tuning bandgaps and hundreds of candidate absorbers. Yet, most did not exhaustively explore all atomic configurations for each composition and often did not incorporate mixing-entropy stabilization explicitly. Tolerance factor strategies (Goldschmidt, Filip’s geometric limits) and the newer data-driven Bartel tolerance factor have been proposed for perovskite formability, with varying accuracy. This study advances prior work by: (1) training CGCNN models on 3,159 PBEsol-relaxed B-site-alloyed ABX3 structures, (2) incorporating configurational entropy in stability screening, (3) exhaustively exploring all configurations for compositions at a finer 1/16 step up to quaternary B-site mixing, and (4) validating electronic structures with PBE0+SOC.
Methodology
- Data generation: Built a PBEsol DFT dataset of 3,159 B-site-alloyed ABX3 structures using a 20-atom cell (four formula units). A-site: Cs, K, Rb; X-site: Br, Cl, I; B-site: Cd, Ge, Hg, Pb, Sn, Zn, with up to quaternary mixing at a compositional step of 1/4, considering all atomic configurations per composition. DFT settings: VASP with PAW pseudopotentials, 500 eV cutoff, Γ-centered k-mesh (auto, 50 Å length), full relaxation (energy 1e-5 eV, forces 0.01 eV/Å). Band structures computed at PBEsol for training labels. - Targets: (1) Decomposition enthalpy ΔHdecomp = E(ABX3) − E(AX) − Σi xi E(BiX2), using most stable AX and BX2 phases from Materials Project (GeCl2 from OQMD). (2) Bandgap Egap. (3) Band type: indirect vs non-indirect (direct/metallic/semi-metallic). Configurational mixing entropy ΔSmix = −kB Σi xi ln xi; stability metric used in screening was ΔHdecomp − TΔSmix at 298 K. - CGCNN training: Used unrelaxed CsPbI3-type input structures to predict properties of PBEsol-relaxed structures. Three separate models: regression for ΔHdecomp and Egap, and binary classification for band type. Default CGCNN features/hyperparameters; data split 70/10/20 for train/val/test with early stopping. - Chemical space exploration: Employed an 80-atom cell (16 B-sites) enabling 1/16 compositional resolution (6.25 at.%). A-site {Cs,K,Rb}, X-site {Br,Cl,I}; B-site {Ge, Sn, Pb, Zn, Cd, Hg} under binary, ternary, and quaternary mixing. Total search: 41,400 compositions (including 20,475 quaternary compositions with A fixed to Cs) and ~5.6×10^6 atomic configurations. For each composition, all configurations were evaluated by CGCNN to identify the lowest ΔHdecomp − TΔSmix. Bartel tolerance factor τ was computed (composition-weighted rB) and used alongside stability. - Screening criteria: (1) Non-indirect band structure, (2) CGCNN-predicted PBEsol Egap < 0.5 eV (to target PBE0 ~1.2–1.4 eV for single-junction or 1.0–2.0 eV for tandem top cells), (3) τ < 4.18, (4) ΔHdecomp − TΔSmix < 0 and as low as possible. To avoid Ge-dominance bias and to emphasize practical PV compositions, selected top-3 lowest ΔHdecomp − TΔSmix per τ-interval (<4.18) and additionally required ≥50% Sn or Pb. This yielded 110 candidates. - DFT validation and advanced properties: Validated the 110 with PBEsol DFT for stability and band structure; then computed PBE0+SOC band structures for those with direct PBEsol bandgaps, identifying 31 candidates near optimal gaps. Calculated effective masses from PBEsol bands using sumo, optical absorption with PBE0+SOC (LOPTICS, NEDOS=2000, Γ-centered 4×4×4), and spectroscopic limited maximum efficiency (SLME) via SL3ME (AM1.5G spectrum). - Additional analyses: Correlation analysis of elemental fractions vs ΔHdecomp, τ, Egap, and band type; τ vs stability consistency; entropy effects on ΔHdecomp distributions.
Key Findings
- Model performance (test set on 20-atom systems): • ΔHdecomp regression: MAE ≈ 0.449 meV/atom, high parity (R² ≈ 1.000). • Bandgap regression: MAE ≈ 0.037 eV, RMSE ≈ 0.061 eV, R² ≈ 0.986. • Band type classification: accuracy ≈ 0.96, precision ≈ 0.84, recall ≈ 0.90, F1 ≈ 0.87. - Entropy and composition effects: Including −TΔSmix (298 K) shifts ΔHdecomp distributions to more negative values; stability improves with higher B-site mixing and with Ge content, whereas Zn tends to destabilize. Cs at A-site favors stability; I at X-site tends to increase stability metric ΔHdecomp (less stable) but lowers bandgap. - Large-scale exploration: 41,400 compositions and ~5.6×10^6 configurations exhaustively evaluated with CGCNN. CGCNN stability predictions for 80-atom validations showed errors ≤6 meV/atom versus PBEsol. However, band-structure generalization deteriorated at the larger cell: of 110 CGCNN-selected non-indirect candidates, 79 were indirect at PBEsol; Egap MAE increased to ~0.14 eV. - Candidate selection: 110 candidates after CGCNN screening; 31 confirmed direct-gap candidates underwent PBE0+SOC bandgap calculations near target ranges for single-junction (≈1.2–1.4 eV) or tandem top cells (~1.73 eV). Ten top compounds identified with favorable gaps, stability, effective masses, and SLME (examples from Table 1): • CsGe0.3125Sn0.6875I3: PBE0 gap ≈ 1.34 eV; suggested for single-junction PV. • CsGe0.0625Pb0.3125Sn0.625Br3: PBE0 gap ≈ 1.73 eV; suggested for tandem top cell. • Additional examples: CsGe0.5625Sn0.4375Br3 (PBE0 ≈ 1.39 eV), CsPb0.3125Sn0.6875I3 (≈1.73 eV), CsGe0.375Pb0.625I3 (≈1.77 eV), etc. Effective masses generally < 1 m0 (exceptions noted for two Cl-containing Hg/Cd quaternaries). SLME values at 1 µm were up to ~36% for certain Br/I systems. - Experimental comparison: Across 19 reported perovskites, PBE0+SOC bandgaps show RMSE ≈ 0.30 eV vs experiment. Notably, CsGexSn1−xBr3 PBE0 gaps underestimate experiment by ~0.5–0.6 eV; other systems by ~0.1 eV.
Discussion
The framework successfully addresses the challenge of navigating vast compositional and configurational spaces by using CGCNN as a surrogate to exhaustively evaluate all atomic configurations per composition. This enables identification of true ground-state configurations and avoids biases from random or SQS-only sampling. Integrating mixing entropy and the Bartel tolerance factor provides a more realistic assessment of thermodynamic stability and perovskite formability. The approach efficiently narrows thousands of candidates to a handful with near-ideal bandgaps and favorable transport/absorption metrics, highlighting CsGe0.3125Sn0.6875I3 for single-junction and CsGe0.0625Pb0.3125Sn0.625Br3 for tandem top cells. However, while stability predictions transfer well to larger cells, bandgap/type predictions degrade outside the training domain, indicating the need for domain adaptation (e.g., active learning) for electronic properties. The validated candidates and structure–property insights (e.g., stabilizing role of Ge, Egap trends with halides and B-site elements) are relevant for experimental synthesis and device design.
Conclusion
This study presents a DFT/ML pipeline that: (1) trains CGCNN models on 3,159 PBEsol-relaxed B-site-alloyed ABX3 perovskites, (2) exhaustively searches 41,400 compositions and ~5.6×10^6 configurations at 1/16 composition resolution, (3) incorporates configurational entropy and formability (τ) into stability screening, and (4) validates electronic properties with PBE0+SOC, optical absorption, and SLME. From 110 screened candidates, 31 direct-gap materials and 10 top performers were identified, with CsGe0.3125Sn0.6875I3 (single-junction) and CsGe0.0625Pb0.3125Sn0.625Br3 (tandem top) recommended. Future work should improve electronic property predictions via active learning and inclusion of larger-cell training data, extend alloying to A- and X-sites, and integrate defect energetics, carrier transport, and interfacial stability to better reflect device performance.
Limitations
- Transferability of band structure predictions: CGCNN trained on 20-atom cells underestimates errors in larger 80-atom systems; many predicted non-indirect gaps became indirect upon PBEsol validation; Egap MAE increased to ~0.14 eV. - Bandgap accuracy vs experiment: PBE0+SOC still deviates (RMSE ~0.30 eV), notably underestimating CsGexSn1−xBr3 by ~0.5–0.6 eV. - Stability metric scope: ΔHdecomp − TΔSmix does not account for processing-related oxidation of Ge/Sn to +4 states, which can reduce device Voc, nor for kinetic factors. - Missing physics: Does not include defect formation energies and levels, detailed charge transport beyond effective masses, surface/interface stability, or degradation pathways. - Chemical space constraints: Only B-site alloying explored exhaustively; A- and X-site alloying were fixed per search and should be incorporated in future studies.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny