Chemistry

Universal machine learning aided synthesis approach of two-dimensional perovskites in a typical laboratory

Y. Wu, C. Wang, et al.

This innovative research by Yilei Wu and colleagues presents a universal framework that blends high-throughput experiments with machine learning to revolutionize the synthesis of 2D silver/bismuth organic-inorganic hybrid perovskites. Notably, this approach has quadrupled the success rate of material synthesis, paving the way for more efficient and resource-conscious advancements in material science.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the challenge of accelerating materials synthesis in typical laboratories where experimental resources, precursor availability, and the number of testable conditions are limited. While machine learning has successfully predicted material properties and structures in data-rich domains, guiding experimental synthesis remains difficult due to sparse, imbalanced, and biased datasets (e.g., literature favoring successful syntheses and lack of failed experiments). The gap between first-principles predictions and realizable syntheses further complicates translation to the lab. The authors aim to develop a practical, interpretable ML framework that can work with small, imbalanced datasets to guide synthesis of two-dimensional hybrid organic-inorganic perovskites, focusing on Ag/Bi iodide systems. The purpose is to integrate small-scale high-throughput experiments, chemically informed descriptors, subgroup discovery to identify domains of applicability, and a simple yet robust classifier to predict synthesis feasibility, thereby improving experimental success rates compared to trial-and-error and intuition.

Literature Review

The paper surveys prior applications of ML in materials science, noting successes in predicting formability and properties, and the development of closed-loop robotic experimentation frameworks. However, these often require extensive data or high experimental cost and struggle with imbalanced datasets. The authors highlight the prevalence of small datasets and class imbalance leading to over/underfitting and limited extrapolation. Techniques like over-sampling (e.g., SMOTE) and under-sampling have been proposed but can perform poorly on certain classes. In perovskites, prior work combined high-throughput synthesis with ML to classify structures (e.g., Sun et al.) and to accelerate single-crystal discovery (Kirman et al.). A recent rulebook reviewed how spacers influence 2D perovskite structure and performance, motivating descriptor choices. The authors position their approach as integrating data-mining (subgroup discovery), interpretable ML, and chemically informed descriptors to handle biased, sparse datasets and improve synthesis guidance for 2D HOIPs.

Methodology

- Data generation: Small-scale high-throughput synthesis of Ag/Bi iodide 2D hybrid perovskites under fixed experimental conditions (inorganic precursors, solvent, concentration, temperature). A total of 80 amines were tested as potential organic spacers, yielding 14 positive (2D perovskite formed; 13 synthesized in this work plus 1 from literature) and 66 negative samples. Single-crystal XRD confirmed structures; PXRD confirmed phase purity; UV–vis diffuse reflectance provided optical bandgaps. - Descriptor engineering: Started with common RDKit physicochemical descriptors; MolWt and third-order kappa shape index (kappa-3) correlated with synthesis outcomes. A geometry-derived width (y) of the organic spacer (from a rigid sphere model) also correlated strongly. To avoid expensive quantum chemical inputs, problem-specific graph-theoretic descriptors were developed to capture interactions relevant to 2D perovskite formation, focusing on hydrogen bonding and sterics: Num_N (number of nitrogen atoms; spacer valence and RP vs DJ propensity), DiS_NN (max topological distance between nitrogen atoms), STEI (steric effect index around nitrogen), Num_rot (rotatable bonds on the alkyl tail), Eccentricity (branching/topological extent), and MolWt. - Subgroup discovery: To counter dataset bias and define the model’s domain of applicability, subgroup discovery identified a subdomain in descriptor space with the most informative class separation. In the (y, kappa-3) plane, the most interesting subdomain was approximately y in 496–546 pm and kappa-3 in 1.07–1.82 (with tolerance acknowledging basis-set-dependent geometry variations). Molecules in this region are predominantly 5- or 6-membered rings, suggesting cyclic spacers better stabilize the more rigid, smaller AgBiI6 octahedral cage compared to flexible linear spacers. - Classifier: Several classifiers were compared on the identified subdomain (LRC, DTC, GBC, SVC). A linear-kernel Support Vector Classification (SVC) model was selected for its accuracy and interpretability. Training used 10-fold cross-validation. Performance was evaluated by ROC AUC and confusion matrix. The linear SVC coefficients provided a transparent, additive scoring function. - Global screening equation: To extend predictions beyond the subdomain while excluding out-of-domain cases, the linear SVC was combined with step and trigonometric gating terms on y and kappa-3, yielding a closed-form score P as a weighted sum of features plus gating functions. Higher P indicates higher synthesis feasibility; P > 0 denotes a predicted 2D perovskite. - Model interpretation: SHAP analysis quantified global and local feature contributions, ranking feature importance and indicating directions of influence on synthesis feasibility. - Virtual screening: 8,406 candidate amines were collected from PubChem based on similarity to training/test molecules. t-SNE was used for visualization of high-dimensional descriptors. The equation P screened 344 candidates as high-feasibility 2D perovskite spacers. Of these, 123 were commercially available. - Experimental validation: Thirteen commercially available spacers without reactive functional groups (e.g., OH, ether) were unbiasedly selected from the 123 candidates and subjected to synthesis under the same conditions. Structures were confirmed by single-crystal XRD and PXRD; optical properties by UV–vis. DFT (VASP, HSE06) was used to compute electronic structures for selected compounds.

Key Findings

- Dataset and structures: Of 80 tested amines, 14 were positive for forming 2D AgBi iodide perovskites (13 synthesized here plus 1 literature), with RP (A4AgBiI5) and DJ (A2AgBiI5) phases observed, all with single-layer structures. - Optical properties: Initial 13 synthesized 2D AgBi perovskites exhibited indirect bandgaps of 1.84–1.99 eV. Additional validated perovskites showed similar bandgaps (e.g., 1.76–2.03 eV across examples). - Subdomain and spacer preference: Subgroup discovery identified a favorable region in width y (≈496–546 pm) and kappa-3 (≈1.07–1.82). Molecules in this region are primarily 5- or 6-membered rings, indicating cyclic spacers better stabilize the rigid, small AgBiI6 framework versus linear spacers. - Classifier performance: Linear SVC achieved ROC AUC ≈ 0.85, misclassifying only 1 of 10 2D perovskites in cross-validation within the subdomain. Coefficients indicated Num_rot positively correlates with feasibility, while DiS_NN, STEI, Eccentricity, Num_N, and MolWt had negative coefficients. - Interpretable insights: SHAP ranked feature importance as Num_rot (most important), followed by Eccentricity and STEI. Higher Num_rot increases feasibility; higher DiS_NN, STEI, Eccentricity, Num_N, and MolWt decrease feasibility. Case studies highlighted Num_rot and STEI as decisive local factors. - Predictive equation and screening: The closed-form equation P integrating SVC coefficients and domain-gating terms screened 344 of 8,406 PubChem spacers as high-feasibility candidates; 123 were commercially available. - Experimental validation: 13 candidates were tested; 8 formed 2D AgBi perovskites (61.5% success), a fourfold improvement over the chemist-intuition baseline (16.4%). Electronic structures showed CBM dominated by Bi p and I p, VBM by Ag d and I p with slight Bi s contribution, explaining indirect bandgaps.

Discussion

The integrated framework directly addresses the central challenge of synthesizing new 2D HOIPs with limited, biased data by combining high-throughput experiments, chemically informed descriptors, and a domain-aware interpretable classifier. Subgroup discovery mitigated class imbalance and bias by restricting modeling to a statistically informative subdomain, improving accuracy and yielding physically meaningful descriptor thresholds. The linear SVC provided a transparent, closed-form feasibility score, facilitating understanding of how molecular topology and sterics modulate hydrogen bonding and accommodate the rigid AgBiI6 lattice. Compared with standard imbalance-handling techniques (e.g., SMOTE, CondensedNearestNeighbour, EasyEnsembleClassifier), the proposed subdomain-trained model maintained better performance on non-2D negatives. Experimentally, the approach increased synthesis hit rate to 61.5% from 16.4%, validating practical utility. The framework outputs probability-like feasibility estimates and is flexible to integrate alternative kernels or models, though interpretability is prioritized to support scientific insight and theory development. The insights—favoring cyclic spacers with low steric hindrance and adequate tail flexibility—are consistent with the smaller, stiffer AgBiI6 cage and are expected to generalize within similar inorganic frameworks and conditions.

Conclusion

The work presents a universal, interpretable ML-aided synthesis framework for 2D HOIPs that functions effectively with small, imbalanced datasets. By uniting high-throughput synthesis data, subgroup discovery to define domains of applicability, problem-specific graph/topology descriptors, and a linear SVC classifier, the authors derive a closed-form feasibility equation. Screening 8,406 candidates yielded 344 promising spacers, and 8 of 13 tested predictions formed 2D AgBi perovskites, quadrupling the success rate over intuition-driven trials. Physicochemical insights highlight that cyclic spacers with low nitrogen steric hindrance, fewer branches, and sufficient rotatable bonds favor formation within the rigid, compact AgBiI6 framework. Future work could extend the framework to other inorganic backbones and functional targets (e.g., ferroelectricity, chirality), incorporate optimization of experimental parameters (temperature, solvent), and expand training data to refine domain bounds and model generalizability.

Limitations

- Small, imbalanced dataset (80 total samples; 14 positives) limits model scope and increases risk of bias; subgroup discovery mitigates but does not eliminate this. - Fixed synthesis conditions (precursors, solvent, concentration, temperature) restrict generalizability; predictions may vary under different conditions. - The gating boundaries (e.g., width y) are sensitive to molecular geometry optimization details (basis-set dependence), requiring tolerance in boundary application. - The closed-form equation was initially valid within a specific subdomain; out-of-domain exclusion is handled via step/trigonometric terms, but extrapolation to very different chemistries may be unreliable. - Only the Ag/Bi iodide inorganic framework was studied; transferability to other frameworks needs validation. - Experimental validation was limited to 13 commercially available candidates; many predicted candidates were not testable due to availability or reactive functional groups (e.g., hydroxyl, ether) under chosen conditions.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Multimodal assessment of communicative-pragmatic features in schizophrenia: a machine learning approach

A. Parola, I. Gabbatore, et al.

Business

Exploring the mechanism of path-creating strategy for latecomers: a combined approach of econometrics and causal machine learning

Y. Teng, Y. Li, et al.

Medicine and Health

Recent Advancements and Perspectives in the Diagnosis of Skin Diseases Using Machine Learning and Deep Learning: A Review

J. Zhang, F. Zhong, et al.

Medicine and Health

Pre-deployment risk factors for PTSD in active-duty personnel deployed to Afghanistan: a machine-learning approach for analyzing multivariate predictors

K. Schultebraucks, M. Qian, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny