Introduction
The discovery of new materials with desired properties is a significant challenge due to the complex relationship between properties and controllable variables, coupled with the high cost of experimental synthesis and characterization. Computational methods offer a cost-effective alternative, but their accuracy often falls short of experimental measurements. Traditional materials discovery workflows employ a "computational funnel" approach, using increasingly accurate (and expensive) methods to filter a large initial library. However, this approach suffers from drawbacks such as requiring prior knowledge of method accuracies and costs, pre-allocation of resources, and potential mis-ordering of methods. Machine learning has emerged as a tool to improve efficiency, but existing data-driven models can be limited by data availability. This paper proposes an alternative, using multi-fidelity Bayesian optimization with a dynamically evolving model to relate different fidelities (experimental and computational methods) to each other. This progressive, rather than hierarchical, approach allows for flexible resource allocation and doesn't require a priori knowledge of method accuracies, offering budget-aware and accuracy-aware materials discovery.
Literature Review
The paper reviews existing approaches to materials discovery, highlighting the limitations of computational funnels. It discusses the use of machine learning in materials screening, particularly multi-fidelity machine learning approaches that integrate data from various sources to build more accurate predictive models. The authors note previous work utilizing multi-fidelity models for predicting properties like band gaps and the successes of Bayesian optimization in various fields, including materials discovery. The review emphasizes the limitations of current techniques in handling the uncertainties inherent in machine learning models, the difficulties in pre-determining the optimal resource allocation across different methods and the general limitations of computational funnel methods. The paper positions its proposed method as an improvement over existing multi-fidelity models and Bayesian optimization techniques.
Methodology
The authors propose a multi-fidelity Bayesian optimization approach named Targeted Variance Reduction (TVR). This method iteratively trains a probabilistic model using a multi-output Gaussian process to link data from multiple fidelities (experimental and computational methods). The choice of material and fidelity for each iteration is determined by an acquisition function (Expected Improvement, EI) that balances exploration and exploitation, while simultaneously considering the cost and informativeness of each fidelity. The algorithm selects the material/fidelity combination that minimizes the variance of the model prediction at the point with the greatest acquisition function score, scaled by the cost. The multi-output Gaussian process uses a one-hot encoding for fidelities, with the high-fidelity reference mapping to the zero vector. A Matern 5/2 kernel with automatic relevance determination is used, and hyperparameters are optimized via the log marginal likelihood. The paper details the algorithm's steps and provides pseudo-code in the supplementary information. The authors also describe their synthetic dataset generation method, using Liu's 1D function and a process to control the correlation between the lower-fidelity proxies and the ground truth. Three materials discovery datasets were used for evaluation: the Harvard Organic Photovoltaic dataset (HOPV), the Alexandria quantum chemical library, and the Chen Alchemical library. For each dataset, the authors compare their approach to a computational funnel (with ideally provisioned budgets), Bayesian optimization at the target fidelity (EI), and random search.
Key Findings
The study demonstrates the effectiveness of the TVR-EI algorithm through its application to both synthetic and real-world materials discovery problems. Experiments with a synthetic dataset showed that TVR-EI outperforms the computational funnel when the proxy measurements are either relatively expensive or accurate. However, the computational funnel performed better when the proxies were both cheap and inaccurate. The significant difference in performance between the two methods highlights TVR-EI’s ability to dynamically allocate resources based on the information gained during the optimization process, unlike the computational funnel where resources are pre-allocated. In the three real-world materials discovery challenges (Alexandria, HOPV-15, and Chen datasets), TVR-EI achieved comparable or superior performance to both the computational funnel (even with perfectly provisioned budgets) and single-fidelity Bayesian optimization (EI). The results show that, on average, TVR-EI achieved a 2.8x efficiency gain and a 20% reduction in regret compared to the other methods. The analysis of the budget allocation across different fidelities reveals that TVR-EI efficiently utilizes lower-fidelity proxies when they are informative but avoids spending resources on uninformative or highly correlated proxies. For the Chen dataset, both the computational funnel and TVR-EI significantly reduced the cost compared to random search and single-fidelity optimization. For the HOPV dataset, single-fidelity Bayesian optimization and TVR-EI outperformed the computational funnel, which struggled with poorly correlated fidelities. The Alexandria dataset represented an intermediate case with comparable performance between the computational funnel, single-fidelity Bayesian optimization, and TVR-EI, although TVR-EI still demonstrated an advantage. Analysis of over- and under-provisioned funnels further illustrated the advantages of the dynamic resource allocation in TVR-EI.
Discussion
The findings demonstrate that the proposed TVR-EI algorithm offers a significant improvement over traditional computational funnels and single-fidelity Bayesian optimization for materials discovery. The ability to dynamically adjust resource allocation based on the learned relationships between fidelities leads to greater efficiency and reduced optimization costs. The robustness of TVR-EI to uninformative proxies and its capacity to leverage internal correlations within the data are key advantages. The variations in performance across the different datasets highlight the importance of considering the relative cost and correlation of different fidelities when selecting an optimization strategy. The study suggests that TVR-EI is particularly beneficial when the relationships between fidelities are complex or when the optimal allocation of resources is not easily determined a priori.
Conclusion
The paper concludes that the TVR-EI algorithm offers a promising approach to high-throughput materials screening. It effectively combines the benefits of multi-fidelity machine learning and Bayesian optimization, resulting in significant improvements in efficiency and cost reduction compared to established methods. Future research could explore different acquisition functions, kernel choices, and applications to broader materials systems and properties.
Limitations
The study relies on the availability of multiple fidelities of data. The performance of the method is dependent on the quality and correlation between these fidelities. The cost of evaluating different fidelities was assumed to be known and constant, which might not always hold true in real-world scenarios. The evaluation focused on three specific datasets and the generalizability to other materials discovery problems requires further investigation.
Related Publications
Explore these studies to deepen your understanding of the subject.