Engineering and Technology
Machine learning-based discovery of vibrationally stable materials
S. A. Tawfik, M. Rashid, et al.
This paper presents a groundbreaking machine learning approach developed by Sherif Abdulkader Tawfik and colleagues to predict the vibrational stability of materials, a key aspect in material synthesizability. The classifier they created acts as a swift alternative to traditional computational methods, promising to enhance material database searches significantly.
~3 min • Beginner • English
Introduction
The study addresses the challenge of determining whether hypothesized materials in large online databases are synthesizable. While energy above the convex hull (Eh) is widely used as a thermodynamic stability filter (often Eh < 100 meV implies stability), this alone is insufficient because vibrational stability is also required. Vibrationally unstable materials exhibit imaginary phonon modes and may have low Eh yet be unstable (e.g., LiZnPS4, SiC, Ca3PN). Computing vibrational spectra via first-principles methods at database scale is computationally prohibitive. The research aims to develop a machine learning classifier trained on a sufficiently large dataset of vibrational stability to rapidly predict vibrational stability of inorganic crystals, complementing thermodynamic filters and enabling high-throughput screening for synthesizability.
Literature Review
Prior work has focused on thermodynamic stability and prediction of vibrational properties for already stable materials. Legrain and others used ML to predict vibrational properties such as vibrational free energies and entropies for vibrationally stable materials. Datasets of phonons exist but cover a small subset of materials: Petretto et al. reported DFPT phonons for 1521 semiconductors (15% unstable). Choudhary et al. identified 21% of 5015 JARVIS-DFT materials as unstable using DFPT. A ~10K finite-difference phonon database exists (phonopy/phonondb) but results are not text-retrievable. A preprint attempted ML prediction of instabilities for 2D materials. There remains a gap for a general ML-based predictor of vibrational stability across diverse inorganic crystals with scalable features.
Methodology
Dataset construction: The MPStability dataset was built from Materials Project by including all materials with 4 atoms or fewer in the unit cell; for 4-atom cells, a bandgap > 0.5 eV was required. The resulting dataset includes 3112 materials (metals, semiconductors, insulators). For vibrational calculations, supercells of 3×3×3 were used for single-atom unit cells and 2×2×2 for multi-atom cells.
Vibrational stability labeling: Finite-difference phonon calculations were performed. Displaced structures were generated with phonopy. Forces were computed using VASP with the plane-wave PAW method, PBE-GGA functional, 520 eV cutoff, Monkhorst–Pack k-point mesh 10×10×10, and SCF energy tolerance 1e-5 eV. Force constants were obtained via phonopy, and vibrational density of states (VDOS) computed on an 8×8×8 q-mesh. Materials were labeled unstable if a significant density of imaginary phonons was present in the VDOS. Method consistency was verified against DFPT data (Petretto et al.) on 248 overlapping materials, showing ~4% discrepancy in stability classification.
Featurization: 1147 features per material were generated, comprising ROSA features (218), symmetry functions (600), symmetry group one-hot features (SG; 230), and atomic features (97). ROSA descriptors were obtained from a single SCF iteration, extracting electronic eigenvalues and total energies. Symmetry functions capture translationally invariant geometric information. SG features hot-code space group into 230 columns. Atomic features summarize elemental property statistics.
Machine learning models and training: Random Forest (RF) and Gradient Boosting (GB) classifiers were evaluated; RF performed slightly better and was used for subsequent tasks. Five-fold stratified cross-validation was employed; in each iteration, four folds for training and one for testing. Class imbalance (982 unstable, 2130 stable) was addressed by augmenting only the training folds with synthetic minority samples using:
- SMOTE: creating synthetic minority examples between minority neighbors in feature space.
- Mixup: hybridizing stable and unstable samples with r = (1−λ)·x_unstable + λ·x_stable using λ ∈ [0, 0.2]; labels were assigned by linearly combining corresponding VDOS energy curves and checking for negative energy peaks (imaginary frequencies) to label as unstable; otherwise stable. No synthetic data were added to test folds.
Evaluation and feature analysis: Performance metrics (precision, recall, F1, ROC-AUC) were computed on test folds. Model calibration was assessed by comparing predicted class proportions to true label distributions across folds. Feature importance from RF was used to identify top 30 features; a reduced model using only these features was trained with same hyperparameters to compare performance and assess descriptor significance. Confidence-based evaluation measured performance across prediction confidence thresholds (per Supplementary Methods).
Key Findings
Baseline (no synthetic augmentation): With vanilla RF and GB, unstable-class performance was limited due to imbalance. Across five folds (averages): RF precision 0.71 (unstable), 0.78 (stable); recall 0.42 (unstable), 0.92 (stable); F1 0.53 (unstable), 0.84 (stable). GB showed similar averages: precision 0.69/0.77, recall 0.42/0.91, F1 0.53/0.84. Maximum over folds for RF unstable-class F1 was 0.57; for GB 0.56.
With synthetic augmentation (SMOTE + mixup) using RF: Averaged over five folds, precision 0.60 (unstable), 0.84 (stable); recall 0.68 (unstable), 0.79 (stable); F1 0.63 (unstable), 0.81 (stable). Maximum across folds: precision 0.63/0.86; recall 0.73/0.82; F1 0.67/0.83. ROC-AUC improved from 0.68 (without synthetic data) to mean 0.73 with synthetic data.
Calibration: True label distribution averaged 32% unstable and 68% stable; model predicted on average 36% unstable and 64% stable across folds, indicating good calibration (≈4% difference).
Confidence filtering: For predictions with confidence ≥ 0.65, the unstable-class average precision, recall, and F1 improved to ~0.70, 0.71, and 0.70, respectively, while still covering ~65% of data points.
Feature importance: A reduced RF using the top 30 features achieved similar average scores as using all 1145–1147 features, indicating most predictive signal is concentrated in a small subset. BACD and ROSA descriptors contributed most, followed by SG features. Features consistently important across folds included std_average_anionic_radius and metals_fraction.
Discussion
The study demonstrates that a machine learning classifier trained on a curated dataset of finite-difference phonon calculations can effectively identify vibrationally unstable materials, addressing a key synthesizability criterion missing from current database filters that rely primarily on convex hull (thermodynamic) stability. Synthetic data augmentation substantially improved minority (unstable) class detection and overall discrimination (AUC 0.73). Good calibration suggests reliable class proportion predictions on unseen data. Confidence-based operation allows users to trade coverage for higher accuracy. The ability to detect onset of imaginary frequencies has broader implications, including estimating ideal strength under strain, aiding transition-state searches in molecular systems, and screening for polar and potentially ferroelectric materials. Concentration of predictive power in a small set of descriptors (notably BACD and ROSA) supports efficient deployment in high-throughput settings.
Conclusion
The work establishes a complete workflow and dataset for predicting vibrational stability of inorganic crystals using machine learning. A new MPStability dataset (3112 materials) with phonon-derived stability labels was generated via finite-difference calculations. Using a feature set combining ROSA, geometric symmetry functions, symmetry-group encodings, and atomic descriptors, a random forest classifier trained with stratified cross-validation and synthetic augmentation achieved substantially improved detection of unstable materials (unstable-class F1 ≈ 0.63; AUC ≈ 0.73), with further gains at higher confidence thresholds. Feature analysis shows that a compact subset of descriptors suffices for strong performance. These models can serve as a rapid pre-filter within materials databases to complement thermodynamic convex-hull criteria and guide experimental synthesis efforts. Future work should expand the training data to include larger unit cells and broader chemistries to improve extrapolation and generalizability.
Limitations
The dataset was constructed by sampling materials constrained by small unit-cell sizes (≤4 atoms) and, for 4-atom cells, a bandgap threshold (>0.5 eV). This introduces distributional differences from the broader Materials Project dataset and limits the model’s ability to extrapolate to materials with larger unit cells or different distributions. Consequently, predictions on arbitrary materials may be less accurate. Improving extrapolation requires expanding the training set to include materials with larger unit cells and more diverse structures.
Related Publications
Explore these studies to deepen your understanding of the subject.

