Introduction
The discovery of advanced functional materials is crucial for addressing global challenges. However, material synthesis is a complex, multi-dimensional process involving various parameters like precursors, additives, solvents, concentration, and temperature. Traditional approaches often rely on trial-and-error, limited by the availability of resources and expertise in a typical laboratory setting. This limitation significantly hampers the exploration of the vast chemical space for novel materials. Data-driven machine learning (ML) techniques have shown promise in accelerating materials discovery, particularly in structure-property prediction. However, applying ML to guide experimental synthesis presents challenges due to the need for large, balanced datasets, which are often unavailable. Existing datasets are often biased towards successful synthesis, lacking information on failed attempts. First-principles calculations can provide data, but a significant gap often exists between theoretical predictions and experimental results. While automated synthesis frameworks exist, they can be expensive and time-consuming, limiting their applicability in resource-constrained labs. This research addresses these challenges by developing a universal framework that combines small-scale high-throughput experiments with ML techniques specifically designed to handle limited and imbalanced datasets, focusing on the synthesis of 2D hybrid organic-inorganic perovskites (HOIPs). 2D HOIPs are attractive due to their enhanced stability, superior optical and electronic properties, and ease of fabrication, making the development of novel 2D HOIPs a high-priority research area. Current methods heavily rely on trial-and-error, hindering efficient exploration of the vast chemical space available.
Literature Review
Several studies have explored the use of ML in materials discovery. For instance, Mroz et al. (2022) highlighted the role of computation in exploring uncharted material space. Zhou et al. (2018) created a library of atomically thin metal chalcogenides. Shields et al. (2021) demonstrated Bayesian reaction optimization for chemical synthesis, and Zuranski et al. (2021) focused on predicting reaction yields via supervised learning. Butler et al. (2018) reviewed machine learning for molecular and materials science. Previous work has shown successes in using ML for property prediction (Lu et al., 2020; Lu et al., 2022) and for predicting material formability (Bartel et al., 2019; Wu et al., 2021; Choubisa et al., 2020), but its application in guiding experimental synthesis remains limited. Studies like Kirman et al. (2020) and Sun et al. (2019) demonstrated the potential of combining high-throughput experiments with ML, but these often rely on extensive resources. The current research aims to bridge the gap by developing a framework applicable to typical laboratories with limited resources.
Methodology
This study uses a framework that integrates high-throughput experiments, chemical intuition, and ML techniques. First, high-throughput synthesis experiments were conducted on 79 commercially available amines to explore the synthesis feasibility of 2D silver/bismuth iodide perovskites. The experiment used consistent conditions (inorganic precursors, solvent, concentration, and temperature) to minimize variability and reduce experimental cost. The experiments yielded a dataset of 80 samples (14 successful, 66 unsuccessful). To effectively utilize this imbalanced dataset, the researchers employed subgroup discovery, a data-mining technique to identify a subdomain within the dataset where the distribution of successful and unsuccessful syntheses was balanced. This subdomain provided a more reliable basis for training machine learning models. The physicochemical properties of the organic spacers were quantified using a set of descriptors including molecular weight (MolWt), the third-order kappa shape index (<sup>3</sup>κ), and width (y). The width (y) was derived from a rigid sphere model developed in previous work. The researchers visualized the distribution of these descriptors in 3D scatter plots and 2D projections (MolWt, <sup>3</sup>κ, y), using color-coding to represent successful and unsuccessful syntheses. Subgroup discovery was applied to identify the subdomain characterized by a favorable range of *y* and <sup>3</sup>κ values where the synthesis success rate is significantly higher. In this subdomain, the researchers developed problem-specific descriptors that focused on the interactions between the organic spacers and the inorganic layers in the 2D perovskite structures. These included the distance between nitrogen atoms (DiS<sub>NN</sub>), the steric effect index (STEI) of nitrogen, the number of nitrogen atoms (Num<sub>N</sub>), the number of rotatable bonds in the alkyl tail (Num<sub>Rot</sub>), and the eccentricity of the organic spacer molecule. These descriptors, along with the previously identified descriptors, were used to train a Support Vector Classification (SVC) model with a linear kernel. The SVC model was chosen for its interpretability and efficiency in handling small datasets. 10-fold cross-validation was used to prevent overfitting. The performance of the model was evaluated using the area under the Receiver Operating Characteristic curve (AUC), and confusion matrices. SHAP (SHapley Additive exPlanations) analysis was used to interpret the model's predictions and understand the relative importance of each feature. Finally, the trained model was used to predict the synthesis feasibility of unexplored molecules from the PubChem database, with further screening based on commercial availability. Selected predicted compounds were then synthesized and characterized to validate the model's predictive accuracy.
Key Findings
The high-throughput experiments revealed that only 13 out of 79 (16.4%) tested amines successfully formed 2D AgBi iodide perovskites. Subgroup discovery identified a region in the descriptor space (width *y* and <sup>3</sup>κ) where the synthesis success rate was significantly improved. The SVC model, trained on this region, achieved an AUC of 85%, demonstrating its good performance on the imbalanced dataset. SHAP analysis highlighted the importance of molecular topology, particularly the number of rotatable bonds in the alkyl tail (Num<sub>Rot</sub>), eccentricity, and steric hindrance (STEI) of nitrogen atoms. The analysis revealed that cyclic organic spacers with fewer branches and rotatable bonds were more favorable for 2D AgBi perovskite formation. This is attributed to the relatively rigid inorganic framework of the AgBiI<sub>6</sub> octahedra, which prefers organic spacers with less flexibility to maintain structural stability. The study generated a prediction equation that successfully screened 344 molecules as potential 2D AgBi perovskites from a database of 8406 molecules. Experimental validation on 13 selected commercially available compounds confirmed the model's predictive power, with 8 (61.5%) successfully synthesizing 2D perovskites. This success rate is a significant improvement over the 16.4% success rate based solely on chemical intuition. The synthesized perovskites exhibited indirect bandgaps in the range of 1.76-2.03 eV, consistent with the materials in the training set, indicating the generalizability of the model.
Discussion
The study's findings demonstrate the effectiveness of the proposed ML-aided synthesis framework in addressing the challenges of materials discovery with limited resources. The integration of high-throughput experiments, chemical insights, and advanced ML techniques, particularly the use of subgroup discovery to handle imbalanced datasets, significantly improved the efficiency and success rate of synthesizing 2D AgBi iodide perovskites. The developed prediction equation provides a valuable tool for rapidly screening potential candidates for synthesis. The interpretability of the ML model, aided by SHAP analysis, offered valuable insights into the structure-property relationships, revealing the crucial role of molecular topology in determining the synthesis feasibility of 2D AgBi perovskites. These insights can guide future material design efforts. The high success rate of experimental validation (61.5%) strongly supports the model's reliability and generalizability. The successful synthesis and characterization of novel 2D AgBi perovskites expanded the library of available materials for various optoelectronic applications.
Conclusion
This research presented a novel ML-aided synthesis framework that successfully accelerates the discovery of 2D AgBi iodide perovskites. The framework effectively addresses data sparsity and imbalance issues common in experimental materials science using high-throughput experiments, chemical intuition, subgroup discovery, and interpretable ML models. The approach resulted in a fourfold increase in synthesis success rate compared to conventional methods. The model offers both strong predictive power and interpretability, guiding future research in the design and synthesis of novel functional materials.
Limitations
While the proposed framework achieved a significant improvement in synthesis success rate, some limitations exist. The model's predictive power is specifically tailored to the synthesis conditions used in this study. Altering these conditions (e.g., temperature, pressure, solvent) might affect the model's accuracy. The reliance on commercially available compounds limited the exploration of the full chemical space. The current dataset, while larger than many previous studies in this area, is still relatively small. Expansion of the dataset with more diverse compounds and synthesis conditions would further enhance the model's robustness and generalizability. Further research could explore the application of the framework to other 2D HOIP systems and explore other machine learning algorithms and feature sets to refine and improve the model.
Related Publications
Explore these studies to deepen your understanding of the subject.