logo
ResearchBunny Logo
Predicting the synthesizability of crystalline inorganic materials from the data of known material compositions

Chemistry

Predicting the synthesizability of crystalline inorganic materials from the data of known material compositions

E. R. Antoniuk, G. Cheon, et al.

Explore the groundbreaking SynthNN, a deep learning model developed by Evan R. Antoniuk and colleagues, that revolutionizes the prediction of synthesizability in inorganic crystalline materials, outpacing traditional methods and experts alike in speed and accuracy.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the central challenge of predicting whether a crystalline inorganic composition is synthesizable—defined as synthetically accessible given current capabilities—irrespective of whether it has been synthesized yet. Inorganic materials synthesis lacks well-understood reaction mechanisms, and practical decisions depend not only on thermodynamic/kinetic factors but also cost, equipment, and perceived value. Common proxies such as charge balancing and thermodynamic stability (via DFT formation energies) are insufficient: only about 37% of known inorganic materials can be charge balanced using common oxidation states, and DFT stability captures roughly 50% of synthesized crystalline inorganics, failing to account for kinetic stabilization and broader non-physical considerations. The study proposes reframing materials discovery as a synthesizability classification task over composition space and introduces SynthNN, a deep-learning model trained on known synthesized compositions and augmented unsynthesized examples, to predict synthesizability directly from stoichiometry without structural inputs.
Literature Review
Prior approaches include: (1) Charge-balancing filters based on common oxidation states, often used as a proxy for synthesizability; however, their predictive ability is poor and has degraded over time as more complex stoichiometries are synthesized. (2) DFT-based assessments using formation energies and energy above the convex hull to infer thermodynamic stability; these methods miss many synthesized compounds due to kinetic/metastability effects and require crystal structures as input. (3) Machine learning models trained to predict formation energies or stability from composition (e.g., Roost and others), which serve as surrogates for DFT but still optimize for stability rather than synthesizability. (4) Structure-based ML methods for synthesizability that require known crystal structures, limiting applicability to hypothetical materials. (5) Literature mining to extract synthesis recipes, which provides synthesis routes but not synthesizability judgments. The field has increasingly adopted positive–unlabeled (PU) learning to handle the scarcity of labeled negatives in vast chemical spaces.
Methodology
Data: Positive (synthesizable) examples are 53,594 unique binary, ternary, and quaternary compositions from ICSD (extracted Oct 2020) without fractional coefficients: 8,194 binaries, 26,218 ternaries, 19,182 quaternaries. Negative/unlabeled examples are artificially generated compositions to match the class mix and elemental prevalence of positives: choose number of unique elements to match binary/ternary/quaternary ratios, sample elements according to their frequency among positives, assign integer stoichiometric coefficients sampled uniformly from 1–20, and exclude any composition that matches (or is an integer multiple up to 20× of) a known synthesized composition or a previously generated negative. These generated examples are treated as unlabeled (potentially positive) within a PU-learning framework. Representation (Atom2Vec): Each composition is a normalized 94×1 vector of atomic fractions. A 94×M learned atom embedding matrix (M is a hyperparameter) is elementwise-multiplied with the composition vector to form a 94×M embedded representation, which is then averaged over rows to yield an M-dimensional input to the classifier. The embedding matrix is learned jointly with the classifier. Model: A 3-layer neural network implemented in TensorFlow 1.12.0. The first two hidden layers use tanh activations; the output layer uses softmax for binary classification. Hidden layer sizes are hyperparameters sampled from {30, 40, 50, 60, 80}. Optimization uses Adam with learning rate in {2e-2, 5e-3, 2e-3, 5e-4, 2e-4}; cross-entropy loss. Training protocol: Data split 90/5/5 (train/validation/test) for a given synthesized-to-unsynthesized ratio N_synth (values up to 20). Stage 1 is supervised training for N_init steps (sampled from {2e4, 4e4, 6e4, 8e4, 1e5}) with generated examples treated as negatives. Stage 2 applies PU reweighting (Elkan & Noto): duplicate each unlabeled instance as one positive and one negative with class weights equal to estimated class-membership probabilities, then continue training for 8×10^5 steps. Final parameters are selected by peak validation accuracy during training. Hyperparameters are tuned via grid search optimizing area under the precision–recall curve (AUC) on a validation set with N_synth=20; at least 20 runs per (M, N_synth) with other hyperparameters randomized. Evaluation: Standard classification metrics are computed taking synthesized as positive and generated as negative; due to label noise in negatives, measured precision and F1 are lower bounds on true values, while recall is unaffected. Benchmarks include charge balancing (predict positive if charge-neutral under common oxidation states) and a composition-based surrogate for DFT stability (Roost) trained on Materials Project energy-above-hull data, then thresholded across Ehull cutoffs to form precision–recall curves. Human benchmarking uses a 100-item quiz (9 synthesized, 91 generated) administered to 20 experts; SynthNN selects the top-9 predicted positives for comparison. Interpretability: Analyze SynthNN outputs across stoichiometric grids for selected chemical families to examine learned charge-balancing behavior; visualize hidden layer embeddings with t-SNE to assess clustering by chemical families and analogies. Code/data availability: SynthNN code and Roost predictions available at the provided GitHub links; energy-above-hull dataset available in the repository; ICSD data obtainable from ICSD.
Key Findings
- SynthNN vs charge balancing and random baselines: SynthNN detects synthesized materials with 2.6× higher precision than charge-balancing and 12× higher than random guessing on a test set with N_synth=20 (2,410 synthesized and 48,199 generated compositions). Charge-balancing’s predictive accuracy declined over time as synthesized stoichiometries became more complex. - SynthNN vs DFT surrogate (Roost): Roost achieves maximum F1=0.12 at Ehull,cutoff=0.05 eV, with recall 69% and precision 6.8%. At the same recall (69%), SynthNN attains precision 46.6%—approximately 7× higher—demonstrating the advantage of directly classifying synthesizability over regressing energy above hull. - Human expert comparison: In a 100-item quiz with 9 synthesized compositions, SynthNN identified 6/9 correctly; the best human expert identified 4/9. Across 267 quizzes sampled from the test set, SynthNN averaged 5.84 correct vs best human’s 4.00. SynthNN produced predictions in milliseconds versus ~30 minutes per human response, yielding ~10^5× speedup. Human experts overestimated the synthesizability of binaries (31% of their picks vs 15% prevalence among synthesized), and their performance dropped markedly for d-/f-block elements; SynthNN’s performance remained steady across blocks. - Data augmentation with generated negatives improves performance: Increasing the number of artificially generated unsynthesized compositions (higher N_synth) improved precision and overall performance, analogous to data augmentation in image classification. - Temporal generalization: Training SynthNN on all materials before 2010 yielded recall of 80% on 2010–2019 synthesized materials at a decision threshold calibrated to achieve precision 5× random on in-distribution validation data. Recall drops for farther future decades, indicating distribution shift effects but improves as more historical data are included. - Learned chemistry: SynthNN implicitly learns charge-balancing for ionic systems (sharp peaks at charge-balanced stoichiometries for Li–Cl and Li–O families) while relaxing it for covalent systems (broader ranges for Li–Ge). It differentiates FeCl2 vs FeCl3 in line with Fe2+/Fe3+ stability. t-SNE of hidden embeddings clusters compositions by chemical families and analogies (e.g., Li2S with Li2Se), indicating learned chemical-family relationships and ionicity without explicit periodic-table inputs. - Dataset statistics of charge balancing: Only ~37% of synthesized inorganic compositions in ICSD are charge-balanced under common oxidation states; even among ionic binary cesium compounds, only ~23% are charge balanced. The fraction of charge-balanced discoveries decreased from ~84% (1920s–1930s) to ~38% (2010–2020).
Discussion
By reframing materials discovery as a synthesizability classification task over composition space, SynthNN captures a broader, more realistic set of factors that govern whether a composition can be synthesized than proxies like thermodynamic stability or charge balancing. The model’s substantially higher precision at comparable recall to DFT surrogates directly translates to fewer false positives and fewer failed synthesis attempts, increasing the efficiency of experimental efforts. Its superiority to individual expert judgments and comparable performance to the aggregate wisdom of experts, coupled with massive speed advantages, suggests SynthNN can guide high-throughput screening and inverse design workflows at scale. The model’s interpretability analyses show it learns chemically sensible principles (charge neutrality for ionic systems, family relationships, ionicity) and applies them flexibly depending on the material class, explaining its robustness across diverse chemistries. Temporal analyses indicate good performance on near-future materials with some degradation under distribution shift; as training data grow and diversify, SynthNN’s ability to generalize to novel regions of chemical space is expected to improve. Integrating SynthNN into screening and generative pipelines can expand the search beyond existing databases to the full space of compositions while maintaining realistic synthesizability constraints.
Conclusion
The study introduces SynthNN, a composition-only deep neural network trained with positive–unlabeled learning to predict the synthesizability of inorganic crystalline materials. SynthNN outperforms charge-balancing heuristics, DFT-based stability surrogates, and human experts, while operating orders of magnitude faster. It learns interpretable, chemically grounded rules (e.g., charge balancing in ionic materials, chemical-family analogies) and achieves high future-recall when trained on historical data. These results demonstrate that directly learning synthesizability from the distribution of known compositions is an effective strategy for reliable materials discovery across vast chemical spaces. Future directions include: improving out-of-distribution robustness (e.g., domain adaptation across decades), enriching negative sampling strategies, incorporating structural or process-condition priors when available, and tighter integration with generative models to co-optimize target properties and synthesizability.
Limitations
- Label noise in negatives: Artificially generated “unsynthesized” examples may include compositions that are actually synthesizable or later synthesized. This leads to underestimation of measured precision, F1, and PR-AUC; recall remains unaffected. - Composition-only representation: Without structural information, SynthNN cannot distinguish polymorphs or structure-specific effects; certain synthesis outcomes depend critically on structure and processing. - Distribution shift: Performance degrades when predicting far-future discoveries relative to in-distribution data, indicating sensitivity to shifts in chemistry trends over time. - Generated negatives: Random coefficient generation (1–20) favors detection by charge balancing and may not fully capture realistic unsynthesizable regions; improved negative generation could further enhance training. - External dependencies: Benchmarking against DFT via a surrogate (Roost) depends on the coverage and biases of the Materials Project dataset and the quality of energy-above-hull predictions.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny