Chemistry

Predicting the synthesizability of crystalline inorganic materials from the data of known material compositions

E. R. Antoniuk, G. Cheon, et al.

Explore the groundbreaking SynthNN, a deep learning model developed by Evan R. Antoniuk and colleagues, that revolutionizes the prediction of synthesizability in inorganic crystalline materials, outpacing traditional methods and experts alike in speed and accuracy.

00:00

Playback language: English

Index

Introduction

The discovery of new materials is crucial for scientific advancement, but identifying synthesizable materials remains a significant challenge. While organic molecule synthesis relies on established reaction mechanisms, inorganic material synthesis is complex due to the lack of well-understood reaction pathways. Synthesizability is influenced by thermodynamic and kinetic factors, but also by cost, equipment availability, and perceived importance. Currently, expert solid-state chemists make synthesizability decisions, limiting the speed of material exploration. Charge-balancing is a common, but insufficient, proxy for synthesizability. Existing ab initio and machine learning methods, like those using density-functional theory (DFT) to calculate formation energies, struggle to account for kinetic stabilization and non-physical factors, accurately predicting synthesizability in only about 50% of cases. This work aims to develop a more accurate and efficient method for predicting synthesizability.

Literature Review

Several approaches have been employed to predict synthesizability. Charge-balancing, a simple and computationally inexpensive method, considers only the net neutral ionic charge. However, it is only accurate in a small fraction of cases (37% of synthesized inorganic materials). DFT-based methods calculate formation energies, assuming synthesizable materials lack thermodynamically stable decomposition products. However, these methods fail to account for kinetic stabilization and are thus limited. Machine-learning based models have also been developed, some using crystal structure as input, but these require structural information not always available for undiscovered materials. Composition-based models exist, but can't distinguish between different crystal structures of the same composition. Data mining techniques have been used to extract synthesis recipes from literature, but don't assess synthesizability directly. This work focuses on developing a deep-learning approach that overcomes the limitations of previous methods.

Methodology

The authors developed SynthNN, a deep-learning classification model that directly predicts the synthesizability of inorganic chemical formulas without requiring structural information. The model uses the atom2vec framework, representing each chemical formula with a learned atom embedding matrix optimized during training. This eliminates assumptions about factors influencing synthesizability. The training data is derived from the Inorganic Crystal Structure Database (ICSD), representing a comprehensive set of synthesized inorganic materials. Since unsuccessful syntheses are rarely reported, the dataset is augmented with artificially generated unsynthesized materials. A semi-supervised learning approach is used, treating unsynthesized materials as unlabeled data and probabilistically reweighting them based on the likelihood of synthesizability, addressing the incomplete labeling challenge. This framework falls under positive-unlabeled (PU) learning algorithms. SynthNN is a 3-layer deep neural network with hyperbolic tangent activation functions in the first two layers and a softmax activation function in the final layer. The model is trained using an Adam optimizer and a cross-entropy loss function. Hyperparameter tuning is performed using a grid search, optimizing for the area under a precision-recall curve (AUC). Model performance is evaluated using standard classification metrics, including precision, recall, accuracy, and F1-score. The impact of varying the ratio of synthesized to unsynthesized training examples is also analyzed.

Key Findings

SynthNN significantly outperforms existing methods. Compared to random guessing and charge-balancing, SynthNN exhibits significantly higher precision in identifying synthesizable materials (12x and 2.6x respectively). In a comparison against Roost, a composition-based machine learning model used as a surrogate for DFT calculations, SynthNN achieves a precision 7 times higher at the same recall. In a head-to-head comparison against 20 expert material scientists on a Synthesizability Quiz consisting of 100 randomly selected formulas (9 synthesized and 91 unsynthesized), SynthNN outperforms all experts, achieving higher precision (0.667 compared to the best human expert's 0.444) and completing the task five orders of magnitude faster. Analysis of SynthNN's predictions reveals that it implicitly learns chemical principles such as charge-balancing, which it applies more frequently to ionic compounds. The model also demonstrates an understanding of chemical family relationships and utilizes chemical analogy in its predictions. Further analysis shows that SynthNN's performance increases when trained on larger and more diverse datasets. The model shows an ability to predict synthesizability of materials synthesized in decades following the training data, with a notable but expected decrease in recall for more distant future decades, indicating better performance on materials similar to those in training.

Discussion

SynthNN's superior performance addresses the need for a more accurate and efficient method for predicting synthesizability. Its ability to outperform both established computational methods and human experts highlights the potential of deep learning for accelerating materials discovery. The implicit learning of chemical principles by SynthNN is particularly notable, demonstrating the model's capacity to capture complex relationships within chemical data. The findings suggest that reformulating materials discovery as a synthesizability classification task is a promising strategy. The significant speed advantage of SynthNN over DFT-based methods is crucial for high-throughput screening of vast chemical spaces. The model's performance in predicting future materials demonstrates its robustness and potential to guide future research efforts.

Conclusion

SynthNN offers a significant advancement in predicting the synthesizability of inorganic materials. Its superior accuracy, speed, and ability to learn chemical principles make it a valuable tool for computational materials discovery. Future work could focus on expanding the training dataset, incorporating additional data sources such as synthesis conditions, and developing strategies to improve the prediction of materials significantly different from those in the training set. Further exploration of the model’s interpretability and the integration with other computational tools for property prediction is needed.

Limitations

The study's reliance on the ICSD dataset introduces a potential bias, as the database may not represent the full spectrum of synthesizable materials. The generation of unsynthesized materials is an approximation and may not perfectly capture the distribution of truly unsynthesizable compositions. The model's performance may be limited in predicting materials with highly unusual or unconventional bonding characteristics. SynthNN performs better on materials similar to those in the training data; performance degrades when predicting materials that are significantly different.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Predicting Synthesizability of Crystalline Materials via Deep Learning

A. Davariashtiyani, Z. Kadkhodaie, et al.

Interdisciplinary Studies

The impact of COVID-19 on the debate on open science: a qualitative analysis of published materials from the period of the pandemic

M. B. Marshall, S. Pinfield, et al.

Computer Science

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

C. H. Martin, T. (. Peng, et al.

The Arts

Revival of positive nostalgic music during the first Covid-19 lockdown in the UK: evidence from Spotify streaming data

T. Y. Yeung

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny