logo
ResearchBunny Logo
Scaling deep learning for materials discovery

Engineering and Technology

Scaling deep learning for materials discovery

A. Merchant, S. Batzner, et al.

This groundbreaking research by Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk shows how scaled graph neural networks can revolutionize materials discovery by uncovering 2.2 million new stable structures from a dataset of 48,000 crystals. This includes complex materials with unique elemental combinations never found before!

00:00
00:00
Playback language: English
Introduction
The discovery of new functional materials is crucial for advancements in various technologies, from clean energy to information processing. Traditional trial-and-error methods are expensive and inefficient. While deep learning has shown promise in other fields, its application to materials discovery has been limited by the accuracy and scalability of existing models. This research addresses this challenge by scaling up machine learning techniques for materials exploration, aiming to significantly improve the efficiency and scope of materials discovery. The study focuses on inorganic crystals, a crucial class of materials with significant technological applications but a historically challenging discovery process. Existing computational methods, such as those employed by the Materials Project and OQMD, have cataloged approximately 48,000 computationally stable materials using density functional theory (DFT) calculations. However, these methods have limitations in predicting stability accurately and exploring diverse chemical spaces. This paper proposes a novel approach using large-scale active learning and advanced GNNs to overcome these limitations and accelerate the discovery of new stable inorganic crystals.
Literature Review
Previous studies have utilized various computational and experimental approaches to discover new materials. Experimental methods, while providing direct validation, are limited by cost and throughput. Computational approaches, primarily based on DFT calculations, have yielded databases such as the Inorganic Crystal Structure Database (ICSD), the Materials Project (MP), and the Open Quantum Materials Database (OQMD). However, these databases are limited in size and accuracy in predicting stability, with past machine-learning methods proving ineffective at accurately estimating the stability of materials relative to the convex hull of energies from competing phases. Existing data-driven methods aided in further materials discovery, but machine-learning techniques for estimating stability were ineffective. The limitations of these existing methods highlight the need for a new approach that combines scalable computation with advanced machine learning techniques to improve the accuracy and efficiency of materials discovery.
Methodology
The research employs a two-pronged approach leveraging large-scale active learning and state-of-the-art GNNs. First, diverse candidate structures are generated using symmetry-aware partial substitutions (SAPS) and random structure search. These structures are then filtered using GNNs, specifically, the Graph Networks for Materials Exploration (GNOME) models. The GNOME models are trained iteratively using an active learning approach. In each round, candidate structures are filtered based on the model's predictions, and their energies are computed using DFT. This verified information is then used to refine the models, creating a data flywheel that iteratively improves the accuracy of predictions. This iterative process is applied to both structural and compositional pipelines. Structural models take crystal structure as input, while compositional models use only chemical formulas. The GNOME models utilize message-passing neural networks, employing aggregate projections as shallow multilayer perceptrons (MLPs) with swish nonlinearities, and a key architectural improvement is the normalization of messages from edges to nodes. DFT calculations were performed in the Vienna Ab initio Simulation Package (VASP). Performance was measured by the mean absolute error (MAE), and the precision of stable predictions (hit rate) was compared to previous studies and existing databases like Materials Project. Additionally, the study uses techniques like volume-based test-time augmentation and uncertainty quantification through deep ensembles to improve the robustness and accuracy of the predictions.
Key Findings
The GNOME models achieved unprecedented success in materials discovery. They discovered over 2.2 million structures that are stable relative to existing databases, an almost order-of-magnitude increase. Of these, 381,000 are on the updated convex hull, indicating genuinely new stable materials. The models exhibit power-law improvement with increasing data size, demonstrating the benefits of scale. The prediction accuracy improved to an MAE of 11 meV atom⁻¹, with hit rates exceeding 80% for structural predictions and 33% for compositional predictions (a significant improvement over the 1% achieved in previous work). The models also showed emergent out-of-distribution generalization, accurately predicting structures with more than four unique elements—a chemical space previously difficult to explore. Validation was performed by comparing predictions with experimental data from the ICSD and with higher-fidelity r²SCAN calculations. 736 structures independently verified experimentally matched those predicted by GNOME. Furthermore, the analysis demonstrates that the GNOME dataset unlocked new modeling capabilities, enabling the training of highly accurate and transferable learned interatomic potentials (MLIPs). These MLIPs showed unprecedented zero-shot generalization, accurately predicting properties of materials, including those with unseen compositions and under different temperature conditions, outperforming existing general-purpose potentials and even models trained specifically on those materials.
Discussion
The research significantly advances the field of materials discovery by demonstrating the power of scaled deep learning. The order-of-magnitude increase in the number of known stable crystals opens up a vast new chemical space for exploration. The ability to predict the stability of materials with high accuracy and to generalize to unseen chemical spaces is a major breakthrough. The development of accurate and transferable MLIPs further enhances the utility of this approach by enabling efficient and reliable simulations of material properties. These findings highlight the potential of combining large-scale data generation with advanced machine learning to tackle complex scientific challenges. The success of the GNOME models in predicting stability and various material properties using significantly fewer resources than traditional methods positions them as a powerful tool for future materials science research. The emergent out-of-distribution generalization capabilities suggest that continued scaling of these models will further enhance their predictive power and utility.
Conclusion
This study showcases the significant potential of scaled deep learning for materials discovery. The GNOME models have substantially expanded the number of known stable inorganic crystals, enabling the development of highly accurate and transferable MLIPs. Future work could focus on improving the understanding of phase transitions, dynamic stability, and ultimately, the synthesizability of these newly discovered materials. The success of this approach suggests that large-scale, data-driven methods have the potential to revolutionize materials science by accelerating the discovery and characterization of new materials with desirable properties.
Limitations
While the study significantly expands the known stable crystal structures, it does not address the synthesizability of these materials. Furthermore, the accuracy of the models relies on the accuracy of the underlying DFT calculations, which can have inherent limitations. The models might not perfectly capture complex phenomena like phase transitions or the effects of configurational entropy, which could influence the actual stability and properties of these materials. Finally, the current models are predominantly focused on bulk materials and may not be directly applicable to other material types or nanostructures.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny