Chemistry

Band gap predictions of double perovskite oxides using machine learning

A. Talapatra, B. P. Uberuaga, et al.

This research, conducted by Anjana Talapatra, Blas Pedro Uberuaga, Christopher Richard Stanek, and Ghanshyam Pilania, explores a hierarchical machine learning approach to predict the band gap of double perovskite oxides, screening a chemical space of 5.2 million compositions and identifying 310 promising candidates for experimental investigation.

00:00

Playback language: English

Index

Introduction

The band gap (Eg), a crucial material property determining electrical conductivity and influencing applications like transistors, LEDs, photovoltaics, and scintillators, is tunable in perovskite oxides. Perovskites, with their formula ABX3 and variations like double perovskites (AA'2B2X6, etc.), offer structural and compositional complexity for Eg control. Density Functional Theory (DFT) calculations are commonly used to determine Eg, but conventional functionals like GGA often underestimate the gap significantly. This underestimation hinders high-throughput computational screening efforts. Machine learning (ML) has emerged as a powerful tool for materials property prediction, including Eg, bypassing the limitations of computationally expensive high-fidelity DFT calculations. Previous ML studies on perovskite band gaps often focused on single perovskites or limited datasets of double perovskites, limiting their generalizability. This study aims to overcome these limitations by employing a hierarchical ML-based screening framework using a large, diverse dataset of DFT-calculated band gaps to identify and quantitatively predict the Eg of novel double perovskite oxides.

Literature Review

Numerous studies have applied machine learning to predict band gaps in various materials, particularly using support vector regression (SVR), artificial neural networks (ANN), ordinary least squares regression (OLSR), and LASSO. Early works employed experimental data or small datasets, but recent advances have leveraged high-throughput DFT calculations and more sophisticated techniques such as Gaussian process (GP) co-Kriging and graph neural networks (GNNs). While progress has been made in single perovskite prediction, the exploration of double perovskites has been limited in scope and dataset size. Some studies incorporated formation energy as a predictor, but this can still be computationally expensive for large datasets. This study builds upon previous work by the authors and others, extending the scope to a much larger, more diverse chemical space of double perovskites and employing a two-step ML approach to improve prediction accuracy and efficiency.

Methodology

The study uses a two-step hierarchical ML approach. First, a large database of 5.2 million potential double perovskite oxide compositions was generated, considering 68 elements from the periodic table. This was screened using previously developed ML models to identify 941,140 chemically compatible candidates, then further down-selected to 462,248 thermodynamically stable and formable candidates using an energy above hull criterion of ≤50 meV/atom. This focuses only on cubic structures to simplify calculations. A training dataset of 5152 oxide perovskites (Dp) was created, with DFT calculations using the PBE-GGA functional to determine their band gaps. These were then classified as either wide-band gap (Eg ≥ 0.5 eV) or narrow-band gap (Eg < 0.5 eV). Two separate Random Forest (RF) models were trained: a classification model (Mc) to distinguish between wide and narrow band gaps using the full Dp dataset and a regression model (Mr) to predict the Eg values of wide band gap materials using a subset of Dp (Dag, 1575 compounds). The features used included atomic descriptors (HOMO, LUMO, IE, electronegativity, Zunger's radius, EA) and geometric descriptors (tolerance factor, octahedral factor, mismatch factors). Feature selection was performed using recursive feature elimination (RFE). Both models were rigorously tested using various training/test splits and evaluated via accuracy, precision, recall, R2, and MAE. After training, the Mc and Mr models were sequentially applied to the down-selected candidate dataset (Dfs) to identify 13,589 wide-band gap candidates (Dw) and predict their Eg values. A final selection of 310 high-confidence candidates (Ds) was made based on a >90% probability cutoff for stability, formability, and wide band gap prediction. The accuracy of the predicted band gaps for these 310 compounds was then validated using DFT calculations.

Key Findings

The study achieved high accuracy in both the classification and regression models. The classification model (Mc) achieved an accuracy of 0.94 and 0.95 on the training and test sets respectively, with an AUC of 0.98 for ROC curves and 0.96 for PR curves. The regression model (Mr) demonstrated an R2 of 0.97 and 0.86 and MAE of 0.07 eV and 0.18 eV on the training and test sets respectively. The sequential application of the models to the candidate dataset resulted in 13,589 predicted wide band gap materials and 310 high-confidence candidates with probabilities exceeding 90%. DFT validation calculations confirmed that all 310 high-confidence candidates were wide band gap materials, showing excellent agreement between predicted and calculated band gaps (MAE of 0.21 eV, MSE of 0.07, and R2 of 0.84). Design maps generated using the predicted data provided insights into band gap trends and strategies for band gap engineering. Feature importance analysis revealed that B-site descriptors, particularly electronegativity, HOMO and LUMO energies, and octahedral factor, significantly influenced band gap predictions.

Discussion

The high accuracy and efficiency of the hierarchical ML approach highlight its potential for accelerating materials discovery. The identified 310 novel double perovskite oxides, with predicted band gaps ranging from 0.5 to 4 eV, offer promising candidates for diverse applications. The systematic underestimation of band gaps by the PBE functional is noted, but the relative changes and chemical trends are well-captured, providing valuable insights for materials design. The feature importance analysis and design maps provide valuable guidance for band gap engineering. The study's success demonstrates the power of combining high-throughput DFT calculations with advanced ML techniques for efficient exploration of vast chemical spaces.

Conclusion

This research successfully employed a two-step ML approach to predict band gaps in a massive chemical space of double perovskite oxides. The high-accuracy models identified 310 novel compounds with wide band gaps, validated by DFT calculations. This work establishes a highly efficient framework for materials discovery, and the identified candidates warrant further experimental investigation for various optoelectronic applications. Future research could explore other material classes or incorporate higher-fidelity DFT functionals for even more precise band gap predictions.

Limitations

The study focused solely on cubic structures, potentially overlooking promising compounds in lower symmetry phases. The use of PBE-GGA functionals, while computationally efficient, inherently underestimates band gaps. While the relative trends are well-captured, future studies could incorporate hybrid functionals for more accurate absolute Eg values. The interpretability of the RF models, while improved by feature importance analysis, could be further enhanced by exploring more interpretable ML techniques. Finally, the experimental synthesis and characterization of the predicted compounds remain essential to fully validate the predictive capabilities of the models.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

Using the interest theory of rights and Hohfeldian taxonomy to address a gap in machine learning methods for legal document analysis

A. Izzidien

Medicine and Health

Short-term local predictions of COVID-19 in the United Kingdom using dynamic supervised machine learning algorithms

X. Wang, Y. Dong, et al.

Medicine and Health

Recent Advancements and Perspectives in the Diagnosis of Skin Diseases Using Machine Learning and Deep Learning: A Review

J. Zhang, F. Zhong, et al.

Engineering and Technology

Accelerated identification of equilibrium structures of multicomponent inorganic crystals using machine learning potentials

S. Kang, W. Jeong, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny