logo
ResearchBunny Logo
Predicting Synthesizability of Crystalline Materials via Deep Learning

Engineering and Technology

Predicting Synthesizability of Crystalline Materials via Deep Learning

A. Davariashtiyani, Z. Kadkhodaie, et al.

Discover how a deep-learning model leverages three-dimensional images of crystal structures to predict the synthesizability of hypothetical crystals. This groundbreaking research, conducted by Ali Davariashtiyani, Zahra Kadkhodaie, and Sara Kadkhodaei, showcases an innovative approach to identifying viable materials for battery electrodes and thermoelectric applications.

00:00
00:00
Playback language: English
Introduction
Predicting the synthesizability of novel crystalline materials is crucial for accelerating materials discovery. Traditional methods rely on expert knowledge and trial-and-error, while recent computational methods and machine learning offer more predictive capabilities. Existing approaches often focus on thermodynamic free energies, using the energy of the amorphous solid as a benchmark for synthesizability. However, this approach has limitations, as it cannot reliably predict low-energy unsynthesizable or high-energy synthesizable crystals and is specific to chemical compositions. Machine learning methods have been explored, but existing models often lack generality, being limited to specific crystal structure types or chemical compositions. This study addresses these limitations by developing a deep learning model capable of predicting synthesizability across various crystal structures and chemical compositions.
Literature Review
Several pioneering studies have attempted to quantify synthesizability using thermodynamic free energies. Sun et al. (2016) showed that energy above the ground state isn't a reliable metric. Aykol et al. (2018) proposed using the amorphous solid energy as an upper bound for synthesizable crystals' Gibbs energy, but this is composition-specific. Machine learning approaches have also been employed, such as Hautier et al.'s (2010, 2011) probabilistic model for ion substitution and Ryan et al.'s (2018) neural network using atomic fingerprints. Other studies utilize expert-knowledge-based parameters derived from natural language processing of scientific literature (Kim et al., 2017) or failed experiments (Raccuglia et al., 2016). However, these methods often lack generality or accuracy across diverse crystal structure types and chemical compositions.
Methodology
This research presents a deep-learning model that predicts synthesizability by leveraging both structural and chemical features simultaneously. Crystalline materials are represented as three-dimensional, color-coded images where voxel color encodes atomic number, periodic row, and group number. This image representation allows the use of convolutional neural networks (CNNs) for feature learning. The model employs two approaches: supervised and unsupervised feature learning. In the supervised approach, a CNN encoder is directly connected to a neural network classifier, learning features during classification. In the unsupervised approach, a convolutional auto-encoder (CAE) learns the latent space representation from unlabeled images, which is then used as input to a separate neural network classifier. The data set consists of 3000 synthesizable crystals from the Crystallographic Open Database (COD) and 600 crystal anomalies. Crystal anomalies are defined as unobserved crystal structures for frequently studied compositions in the literature, selected using a natural language processing model. The model's performance is evaluated using metrics such as accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (ROC-AUC).
Key Findings
The supervised CNN classifier achieves a ROC-AUC of 0.981 and 93.7% accuracy, while the unsupervised CAE+MLP classifier achieves a ROC-AUC of 0.968 and 91.9% accuracy. The importance of feature learning is demonstrated by comparing these results to a model using raw images as input, which suffers from overfitting. The CAE+MLP model shows better generalization than the CNN classifier when applied to datasets of electrode and thermoelectric materials. For electrode materials (2088 samples), the CAE+MLP classifier predicts synthesizability with 89% accuracy for COD samples and 85% synthesizability for non-COD samples. For thermoelectric materials (122 samples), the CAE+MLP classifier achieves 78.6% recall on COD samples. The model is also applied to predict the synthesizability of different molybdenum disulfide (MoS2) polymorphs, aligning with Pauling's rules and experimental observations. The synthesizability likelihood is shown to correlate with volumetric capacity and average voltage in electrode materials, offering potential for materials selection.
Discussion
The model's strength lies in its ability to combine generality and accuracy in predicting synthesizability across diverse crystal structures and compositions. However, the black-box nature of the CNN and CAE models limits interpretability. Future work will focus on using methods such as layer-wise relevance propagation to extract more understandable features. The model's performance is compared to existing methods: Aykol et al.'s energy-based threshold model and Ryan et al.'s deep network model. Unlike energy-based approaches that use a single thermodynamic parameter, the deep learning model accounts for complex chemical and structural patterns, accurately predicting low-energy unsynthesizable and high-energy synthesizable crystals. In contrast to Ryan et al.'s model focused on site substitution probability, this approach offers more global structural and chemical pattern representation, leveraging the inherent translational symmetry of crystals through convolutional encoders.
Conclusion
This study presents a deep-learning model that accurately predicts the synthesizability of crystalline materials across diverse crystal structures and chemical compositions. The model's superior generalization capabilities, demonstrated through its application to electrode and thermoelectric materials, underscore the power of unsupervised learning in this context. Future work will focus on improving interpretability and extending the model's applicability to a wider range of materials and synthesis conditions.
Limitations
The model's accuracy relies on the quality and representativeness of the training data, which might be limited by the availability of experimentally verified synthesizable and unsynthesizable crystals. The black-box nature of the neural network models makes feature interpretation challenging. The model's performance could be affected by factors such as kinetic limitations and synthesis routes not explicitly considered in the current framework. Furthermore, the relatively small sample size in the thermoelectric dataset might affect the statistical significance of the findings for that application.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny