logo
Loading...
Scaling deep learning for materials discovery

Engineering and Technology

Scaling deep learning for materials discovery

A. Merchant, S. Batzner, et al.

This groundbreaking research by Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk shows how scaled graph neural networks can revolutionize materials discovery by uncovering 2.2 million new stable structures from a dataset of 48,000 crystals. This includes complex materials with unique elemental combinations never found before!... show more
Introduction

The study addresses the long-standing challenge of efficiently discovering energetically favorable inorganic crystal structures. Traditional experimental discovery has catalogued only a fraction of stable crystals due to cost and throughput limitations, while first-principles computational databases (e.g., Materials Project, OQMD) expanded coverage but still fall short for large, combinatorial chemical spaces. The authors hypothesize that scaling graph neural networks (GNNs) with large, iteratively curated datasets via active learning can accurately predict materials stability (decomposition energy relative to the convex hull) and thereby enable efficient, large-scale discovery beyond previous human-guided approaches. The work aims to develop and validate a scalable machine learning framework (GNOME) that combines diverse candidate generation with high-accuracy energy prediction to accelerate discovery and unlock downstream modeling capabilities such as transferable interatomic potentials, ultimately impacting applications from layered materials to solid electrolytes.

Literature Review

Prior work established the landscape of inorganic materials through experimental databases such as ICSD and computational repositories including the Materials Project (MP), OQMD, AFLOWLIB, and NOMAD. Earlier computational discovery leveraged density functional theory (DFT) with substitution heuristics and prototype enumeration to identify stable compounds, growing the set of computationally stable materials to around 48,000. However, machine-learning approaches have struggled to reliably predict decomposition energies relative to the convex hull, limiting utility for stability screening. Advances in deep learning demonstrate strong scaling and emergent capabilities in other domains (language, vision, biology), suggesting potential benefits for materials science if sufficient data and compute are available. The paper builds on these insights and prior databases, introducing new candidate generation methods and scaled GNN models to overcome previous limitations in stability prediction and exploration, particularly for higher-order compositions (4+ elements) that are challenging for human-guided discovery.

Methodology

The approach combines large-scale candidate generation with iterative active learning using graph neural networks (GNOME) and first-principles validation.

  • Candidate generation: Two complementary pipelines.
    1. Structural pipeline: Start from known crystals and perform diverse modifications, emphasizing discovery by (a) reweighting ionic substitution probabilities to prioritize novel chemistries and (b) employing symmetry-aware partial substitutions (SAPS) to enable incomplete replacements efficiently. Generated structures are filtered by GNOME predictions using volume-based test-time augmentation and deep-ensemble uncertainty quantification; filtered candidates are clustered and polymorphs ranked before DFT evaluation.
    2. Compositional pipeline: Predict stability from composition alone (reduced chemical formulas) using GNOME. Relaxed constraints beyond strict oxidation-state balancing expand the compositional search. For selected compositions, initialize 100 random structures and perform ab initio random structure searching (AIRSS) to identify low-energy polymorphs, followed by DFT relaxation.
  • GNOME models: Graph neural networks predicting total energies. Inputs encode elements via one-hot embeddings; message passing uses shallow MLPs with swish nonlinearities. For structural models, messages from edges to nodes are normalized by average atomic adjacency across the dataset. Initial training uses a 2018 MP snapshot (~69k materials), improving prior benchmark MAE from 28 to 21 meV/atom using the refined architecture.
  • Active learning loop: In rounds, GNOME filters candidates; selected structures undergo DFT relaxation (VASP; PAW-PBE functional) to obtain energies, which both validate predictions and expand the training set. Models are retrained with the enlarged dataset, improving accuracy and hit rate. Thresholding is based on predicted decomposition energy relative to competing phases (convex hull). Six rounds are reported, with ensemble models used for uncertainty and selection.
  • Validation: Compare GNOME predictions to experimental ICSD additions and to higher-fidelity r²SCAN meta-GGA computations to assess robustness beyond PAW-PBE. Structural diversity assessed via prototype clustering, and discovery coverage analyzed across elemental complexity (e.g., >4 unique elements).
  • Learned interatomic potentials (MLIPs): Pretrain an E(3)-equivariant NequIP potential on energies and forces from the GNOME DFT relaxation trajectories (ionic relaxations) across diverse structures. Evaluate zero-shot on AIMD data (including unseen compositions, melted structures, and vacancy-containing configurations) without material-specific training; assess scaling, transferability under temperature shifts (e.g., train at 400 K, test at 1000 K), and capability to classify superionic conductors in high-throughput simulations. Compare to existing general-purpose MLIPs and to NequIP models trained from scratch on target data.
Key Findings
  • Scale of discovery:
    • 2.2 million candidate structures found below the prior convex hull; 381,000 new entries lie on the updated convex hull, bringing the total to ~421,000 stable crystals—an order-of-magnitude expansion over previous work.
    • Substantial gains in higher-order systems: efficient discovery in spaces with >4 unique elements, despite such chemistries being absent from training in earlier rounds.
    • Over 45,500 novel structural prototypes discovered (5.6× increase over ~8,000 in the Materials Project), demonstrating structural diversity beyond simple substitutions or prototype enumeration.
  • Model accuracy and efficiency:
    • GNOME test errors improve with scale, reaching ~11 meV/atom MAE on relaxed structures; initial architecture improved baseline from 28 to 21 meV/atom on MP2018 data.
    • Hit rates (precision for predicted-stable candidates) increased from <6% (structural) and <3% (compositional) initially to >80% and ~33%, respectively, after six active-learning rounds (vs ~1% in prior composition-only approaches).
    • Emergent out-of-distribution generalization demonstrated on structures from random structure search and on compositions with 5+ unique elements.
  • Experimental and higher-fidelity validation:
    • 736 GNOME-discovered structures independently match experimentally realized ICSD entries.
    • Of 3,182 compositions added to MP post-snapshot, 2,202 appear in the GNOME database, with 91% structural match.
    • Under r²SCAN: 84% of discovered binaries/ternaries show negative phase-separation energies; 86.8% of tested quaternaries remain stable on the r²SCAN convex hull.
  • Application-targeted screening:
    • Layered materials: estimated stable layered count increases from ~1,000 to ~52,000 with GNOME.
    • Solid electrolytes: 528 promising Li-ion conductors identified (≈25× increase over a prior study’s criteria).
    • Li/Mn transition-metal oxides: 15 additional candidates stable relative to MP (vs 9 in original study).
  • Interatomic potentials (MLIPs):
    • Pretrained GNOME NequIP potentials achieve power-law improvements in zero-shot force accuracy with scale and outperform existing general-purpose potentials on benchmark tests.
    • Strong transferability under distribution shift (e.g., training at 400 K, testing at 1000 K) and zero-shot performance competitive with or exceeding models trained on hundreds of target-specific structures.
    • Successfully classifies superionic conductors in MD for 623 unseen compositions, enabling high-throughput electrolyte screening.
Discussion

The results demonstrate that scaling GNNs with an active-learning loop and broad candidate generation can overcome historical barriers in predicting materials stability, directly addressing the research goal of accelerating inorganic crystal discovery. By iteratively improving model accuracy and confidence estimates with DFT-validated feedback, GNOME achieves high hit rates and low energy errors, yielding efficient exploration even in combinatorially complex regions (e.g., 5+ elements) where human intuition and prior heuristics struggle. Validation against experimental databases and higher-fidelity r²SCAN functionals indicates that the discovered materials are meaningfully stable and not artifacts of specific simulation settings. Beyond discovery, the large, diverse dataset of DFT relaxations enables training robust, transferable interatomic potentials that generalize zero-shot to unseen materials and conditions. This capability lowers the barrier for high-throughput property prediction (e.g., ionic conductivity) and accelerates screening workflows in applications such as solid-state electrolytes and layered materials. The observed neural scaling laws and emergent out-of-distribution generalization suggest that further expansion of data and compute could continue to improve model universality, moving toward a broadly applicable energy predictor for crystalline materials.

Conclusion

GNNs trained at scale with iterative active learning can accurately predict materials stability and enable efficient discovery, expanding the known set of stable inorganic crystals by over an order of magnitude. The GNOME framework discovered 2.2 million structures below the prior hull and 381,000 new entries on the updated convex hull, with strong validation via experimental matches and r²SCAN. The associated dataset also empowers pretrained, general-purpose interatomic potentials that deliver high zero-shot accuracy and transferability for molecular-dynamics simulations, facilitating rapid property prediction and screening (e.g., superionic conductors). Open challenges include understanding phase transitions among polymorphs, assessing dynamic stability (phonons), accounting for configurational entropies, and improving synthesizability predictions. Future work may extend scaling, enhance uncertainty quantification, integrate higher-fidelity electronic structure methods, and couple discovery with experimental feedback to further accelerate materials innovation.

Limitations
  • Stability assessments rely primarily on DFT with PAW-PBE; while validated with r²SCAN for subsets, discrepancies between functionals can affect convex-hull placement.
  • Discovered materials may be displaced from the convex hull by future discoveries or higher-fidelity calculations; some GNOME entries have already displaced previously ‘stable’ materials in MP/OQMD.
  • Dynamic stability (phonons), finite-temperature effects, and configurational entropies are not comprehensively treated, affecting experimental realizability.
  • Synthesizability (kinetics, processing routes) is not guaranteed by computational stability; experimental validation remains necessary.
  • Compositional models without structure have lower precision than structural models; AIRSS-based structural searches introduce computational cost and may bias accessible polymorph space.
  • Out-of-distribution generalization, though improved, is not universal; performance may degrade on exotic chemistries or extreme conditions not represented in the training data.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny