logo
ResearchBunny Logo
MaterialsAtlas.org: a materials informatics web app platform for materials discovery and survey of state-of-the-art

Engineering and Technology

MaterialsAtlas.org: a materials informatics web app platform for materials discovery and survey of state-of-the-art

J. Hu, S. Stefanov, et al.

Explore the cutting-edge world of materials informatics with MaterialsAtlas.org, a groundbreaking platform designed for materials discovery. This innovative toolset empowers users to validate compositions, predict properties, and search for hypothetical materials, all crafted by a team of expert researchers.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the gap between the rapid progress of machine learning in materials science and the paucity of user-friendly web applications to operationalize these methods. While vast materials datasets and predictive/generative algorithms now exist, most experimental groups lack the expertise to implement, train, and deploy such tools, limiting their adoption. The authors survey the state-of-the-art in materials informatics web apps (finding fewer than 100, largely data repositories) and argue for accessible web services spanning the exploratory discovery workflow—from composition checks and structural validation to property prediction, screening hypothetical materials, and design utilities. They propose MaterialsAtlas.org as a platform to lower barriers, enable high-throughput screening (including batch inputs and downloads), and improve the diversity and quality of web tools compared to established bioinformatics ecosystems.
Literature Review
The survey covers four categories of materials web tools: - Characterization: ML for XRD phase mapping, electron diffraction symmetry determination, deep learning for powder diffraction, and automated crystal structure analysis in Rietveld workflows. Few provide public web services; one noted tool predicts coordination environments from X-ray absorption spectroscopy. - Property prediction: aflow-ML, JARVIS-ML, Crystal.AI, thermoelectric predictors, NIMS tools, SUNCAT catalysis property predictors, and matlearn. Many accept only single inputs (not suitable for screening), provide no prediction confidence, and often use outdated descriptors/algorithms. Benchmarks show graph neural networks (GNNs) outperform older descriptor-based methods for key properties like formation energy and band gap. - Utility tools: Crystal toolkit, phase diagrams (Materials Project, OQMD), prototype finder (AFLOW), JARVIS analysis tools, Matgenie, phonon visualizer (Materials Cloud), and crystallography tools (Bilbao server). - Design tools: Polymer designer, Matlearn composition explorer, SUNCAT catalysis designer, JARVIS heterostructure designer. Additionally, powerful offline CSP tools (USPEX, CALYPSO) and platforms like JAMIP exist. The authors highlight the need for integrated, batch-capable, modern ML-driven web apps with downloadable outputs.
Methodology
Platform scope and tools: - Composition and structure validation: Charge neutrality and electronegativity balance checks using SMACT (with speedups) from composition; Pauling rules (first three) checks from structure; thermodynamic stability assessment via formation energy and e-above-hull. For rapid energy estimation, two ML models are used: composition-only Roost (stoichiometry-based deep learning) and structure-based DeeperGATGNN (a deep global-attention GNN). e-above-hull computed via Pymatgen. - Symmetry and lattice parameter prediction: Neural network models predict space groups and crystal systems from composition, and another deep model estimates lattice parameters (high accuracy for cubic, reasonable for others). - Template-based crystal structure prediction (TCSP): Given a formula (and optional space group), the app proposes candidate structures by leveraging known templates from databases, generating multiple hypothetical structures. - Property prediction: Composition-based models include Random Forest (2D/layered classification; noncentrosymmetry), CrabNet (band gap), Roost (elastic moduli, hardness, thermal conductivity), and Random Forest/CrabNet (superconductor Tc). Structure-based models include DeeperGATGNN (band gap, elastic moduli, hardness, thermal conductivity) and CGCNN (some tasks). Training data primarily from Materials Project and ICSD; structure-based datasets include tens of thousands of samples for moduli and band gaps, and 2701 ICSD samples for thermal conductivity. - Generative design and screening: Composition generation via MATGAN/WGAN; cubic structure generation via CubicGAN (yielding verified stable cubic materials and new prototypes). The platform hosts searchable hypothetical databases: compositions, lithium compounds, cubic structures, and 2D candidates screened by a trained classifier. - Utility tools: Composition enumerator (SMACT-based) with doping options; feature generation pipeline (composition, structural, electronic descriptors); click-and-run ML pipelines for user-specified composition- or structure-based models; structure file conversion and supercell generation; similarity search for formulas (Earth Mover's Distance) and structures (XRD-based features). System and deployment: - Architecture: Django with SQLite3 for hypothetical materials storage; RESTful APIs between Django backend and Vue.js frontend; jobs queued via Redis with Python workers; Ajax for some app communications; Nginx as HTTP server and reverse proxy; Docker for containerized deployment. Job submission leverages Redis queues to mitigate ML latency. Data and code: - Training data from public repositories (Materials Project, ICSD, 2DMatPedia, supercon database). Code largely open-source per cited references; additional code available upon request.
Key Findings
- Platform deliverable: MaterialsAtlas.org integrates composition/structure validation, modern ML property predictors (composition- and structure-based), generative screening databases, and utility tools with batch input and downloadable outputs. - Model datasets (structure-based training): Band gap (36,837 samples, 87 elements), hardness (12,854 samples, 85 elements), bulk/shear moduli (13,176 samples, 89 elements), Young’s modulus (12,854 samples, 85 elements), thermal conductivity (2701 samples, 38 elements), Poisson ratio (12,858 samples, 85 elements). - Performance highlights (as reported): - 2D material classification (composition, Random Forest): 88.98% accuracy (trained on 6351 positive 2DMatPedia and 15,959 negatives from Materials Project). - Noncentrosymmetry (composition, Random Forest): 84.8% accuracy (82,506 samples from Materials Project after filtering conflicting labels). - Band gap: MAE 0.465 eV (Roost/DeeperGATGNN on MP dataset). - Elastic moduli (CrabNet/DeeperGATGNN on ~12–13k MP samples): MAE 15.7 (Bulk), 18 (Shear), 76.8 (Young’s), 8.7 (Poisson ratio). - Hardness: MAE 2.42 (composition Roost; structure DeeperGATGNN; MP-derived data). - Thermal conductivity: MAE 5.03 W/(mK) (CrabNet/DeeperGATGNN; 2688–2701 ICSD samples; experimental scale, limited data). - Superconductivity Tc: MAE 4.76 K (Random Forest/CrabNet; 25,378 samples from SuperCon). - Generative outputs: CubicGAN enabled discovery of 31 new cubic prototypes (Fm3m, F43m, Pm3m) with 4 containing stable materials; 506 cubic materials verified stable by phonon calculations; these are searchable on the platform. - Practical utilities: Composition enumerator, feature generation, similarity search (EMD for formulas, XRD features for structures), ML pipelines for user data, and structure manipulation tools.
Discussion
The platform addresses the core challenge identified in the introduction: making advanced materials informatics accessible to practitioners. By consolidating validation checks, high-throughput-capable predictors, and searchable hypothetical databases, MaterialsAtlas facilitates end-to-end exploratory discovery—enabling rapid screening of compositions/structures, preliminary stability assessment, and prioritization for costly DFT or experiments. Incorporating modern GNNs and attention-based models improves predictive accuracy over legacy descriptor-based tools, aligning with benchmark findings. Batch input and downloadable results support large-scale screening workflows missing in many existing web apps. The authors outline forthcoming enhancements—uncertainty quantification (ensembles, Bayesian, evidential DL), phonon dispersion and synthesizability predictors, ion conductivity models, extended CSP via deep learning, REST APIs, third-party app integration, and interactive visualization (e.g., t-SNE/XRD maps)—to further reduce barriers and broaden utility. Overall, the platform operationalizes SOTA algorithms within an easy-to-use web environment, potentially accelerating hypothesis generation and materials down-selection for diverse research teams.
Conclusion
The work surveys the landscape of materials informatics web apps and introduces MaterialsAtlas.org, a comprehensive, user-friendly platform supporting composition/structure validation, ML-based property prediction (composition- and structure-driven), generative screening of hypothetical materials, and practical utilities for data-driven discovery. Leveraging modern deep learning (GNNs, attention models), the platform achieves competitive performance across multiple properties and provides batch processing with downloadable outputs. Planned upgrades include uncertainty-aware predictions, phonon/synthesizability modules, ion conductivity models, expanded CSP via deep learning, open APIs, third-party integrations, and interactive design-space visualization. Collectively, these contributions aim to catalyze wider adoption of materials informatics in everyday research and accelerate the discovery of novel functional materials.
Limitations
- Current predictors provide point estimates without calibrated uncertainty; uncertainty-aware methods (ensembles, Bayesian, evidential DL) are planned but not yet deployed. - Some properties (e.g., thermal conductivity) are trained on relatively small datasets (≈2700 samples), limiting accuracy and generalizability; thermal conductivity results are noted as experimental. - Ion conductivity prediction is under development due to extremely limited labeled data. - Dynamic stability (phonon dispersion) prediction and synthesizability assessment are planned; DFT-based phonon calculations remain computationally expensive. - Composition-only models omit structural information and cannot resolve polymorphism, potentially biasing predictions; structure-based models require known or predicted structures and thus cannot cover the full chemical space. - Template-based CSP offers fast candidates but crystal structure prediction remains challenging for large/complex systems; deep learning CSP modules are future work. - Many models are trained primarily on Materials Project and ICSD data; domain shift to other datasets or experimental conditions may affect performance.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny