Introduction
Machine learning (ML) is revolutionizing materials science, accelerating the discovery of novel functional materials. However, unlike the mature bioinformatics field with thousands of readily accessible web servers, the materials informatics ecosystem lacks user-friendly web applications. This shortage significantly hinders the widespread adoption of powerful ML-driven tools by experimental materials scientists who often lack the expertise to implement and deploy these tools locally. Existing widely used web services like the Materials Project, Aflow-lib, and OQMD primarily serve as data sources, offering limited analysis tools. Exploratory materials discovery involves several stages: composition exploration, structure prediction and validation, property prediction, and materials design. Each stage demands specific web applications. For instance, composition exploration needs tools to check charge neutrality, electronegativity balance, and formation energy. Structure validation requires tools for structural relaxation, formation energy calculation, Pauling rule checks, and phonon calculations. Property prediction web apps are available, but they often lack support for high-throughput screening and convenient result downloads. The lack of comprehensive, user-friendly web applications for these processes necessitates the development of a platform like MaterialsAtlas.org.
Literature Review
The paper reviews existing materials informatics web services, categorizing them into four types: composition/structure analysis tools (e.g., those for coordination environment prediction from X-ray absorption spectroscopy), materials property prediction tools (e.g., aflow-ML, JARVIS-ML), utility tools for structure and composition analysis (e.g., crystal toolkits from Materials Projects), and materials design tools (e.g., polymer designers). The review highlights the limitations of current web apps, such as their ad-hoc development, inability to handle multiple inputs for screening, lack of performance measures, and outdated algorithms. The authors note the absence of easily accessible web-based tools for tasks such as crystal structure prediction and phonon calculation, which are computationally expensive for high-throughput screening. The review emphasizes the need for user-friendly, comprehensive platforms that integrate various tools throughout the materials discovery process, inspired by the success of bioinformatics web servers. The authors contrast the rich ecosystem of bioinformatics web servers (over 9000) with the significantly fewer (under 100) and less diverse materials informatics tools.
Methodology
MaterialsAtlas.org is designed to address the identified gaps. It provides four categories of web apps:
1. **Composition and Structure Validation:** This includes charge neutrality and electronegativity balance checks (using the SMACT package), Pauling rules checks, formation energy and e-above-hull energy checks (using Bayesian optimization and ML models like Roost and DeeperGATGNN), prediction of crystal symmetry and lattice parameters (using neural networks), and template-based crystal structure prediction (using the TCSP algorithm).
2. **Materials Property Prediction:** MaterialsAtlas employs both composition-based and structure-based ML models for predicting various properties. Composition-based models (e.g., Roost, CrabNet) use chemical composition descriptors, while structure-based models (e.g., DeeperGATGNN, CGCNN) leverage structural information. Specific properties addressed include 2D material prediction (Random Forest), noncentrosymmetric material prediction (Random Forest), band gap (CrabNet and DeeperGATGNN), elastic moduli (Roost and DeeperGATGNN), hardness (Roost and DeeperGATGNN), thermal conductivity (Roost and CGCNN), and superconductor transition temperature (Random Forest and CrabNet). The models are trained using datasets from Materials Project and other sources. While the current implementation provides single-point predictions, the authors plan to incorporate uncertainty estimation in future upgrades.
3. **Hypothetical Material Screening:** MaterialsAtlas includes databases of hypothetical materials generated using generative models (MATGAN for compositions, CubicGAN for cubic structures). These databases allow users to screen for materials with desired properties.
4. **Utility Tools:** This category contains tools for chemical composition enumeration (based on SMACT), feature generation for custom ML model training, composition and structure search, and standard file format conversion. The platform also employs Earth Mover's Distance for compositional similarity search and computed XRD features for structural similarity search. Future additions include phonon prediction, synthesizability prediction, ion conductivity prediction, and improved visualization tools.
The system architecture uses Django for the back-end, Vue.js for the front-end, Redis for job queuing, and Docker for containerization. Python is used for back-end computations.
Key Findings
MaterialsAtlas.org successfully integrates various tools for materials discovery into a user-friendly web platform. The platform provides tools for composition and structure validation, including charge neutrality, electronegativity balance, Pauling rules checks, formation energy and e-above-hull energy calculations, and crystal structure prediction. It offers composition-based and structure-based ML models for predicting various material properties (band gap, elastic moduli, hardness, thermal conductivity, 2D material tendency, noncentrosymmetry etc.) with varying accuracies. The platform also includes databases of hypothetical materials for screening and various utility tools for composition enumeration, feature generation, and structure manipulation. Specific performance metrics (e.g., accuracy, MAE) are reported for several prediction models. The authors demonstrate the usefulness of the platform by showing examples of its application and showcasing the interactive features for visualizing materials properties in a multi-dimensional design space, and highlight the capacity for future expansion via integration of third-party applications and API services.
Discussion
MaterialsAtlas.org significantly addresses the limitations of existing materials informatics platforms by providing a comprehensive, user-friendly, and integrated environment for materials discovery. The platform's success is demonstrated through the integration of diverse state-of-the-art ML models and tools for different stages of the materials discovery process. The platform's modular design allows for continuous expansion and enhancement by integrating community-developed tools and facilitating collaborative development. The use of readily available technologies like Django and Vue.js makes it potentially easily adoptable by researchers outside the core development team. The platform's ability to handle high-throughput screening and offer diverse prediction capabilities accelerates the process of materials discovery and optimization. By lowering the barrier to entry for materials researchers, MaterialsAtlas.org facilitates wider adoption of data-driven tools in materials research.
Conclusion
MaterialsAtlas.org provides a valuable contribution to the field of materials informatics by offering a comprehensive, user-friendly platform for materials discovery. Its integrated suite of tools addresses critical limitations of existing resources, facilitating high-throughput screening, property prediction, and hypothetical material exploration. The platform's modular and extensible architecture enables continuous growth and community contribution. Future work will focus on expanding functionality to include more sophisticated prediction models, improved uncertainty quantification, and more advanced visualization tools.
Limitations
The current version of MaterialsAtlas.org does not include uncertainty estimates in its property predictions; this is planned for future updates. While several ML models are included, the performance of some models might be limited by the size and quality of the training data, particularly for properties with limited experimental datasets (e.g., ion conductivity). The platform’s reliance on existing databases for training data implies that biases inherent in these databases may also influence the performance of the prediction models. The platform’s functionality is limited to inorganic materials at present.
Related Publications
Explore these studies to deepen your understanding of the subject.