
Physics
Machine learning and evolutionary prediction of superhard B-C-N compounds
W. Chen, J. N. Schmidt, et al.
This groundbreaking research, conducted by Wei-Chih Chen, Joanna N. Schmidt, Da Yan, Yogesh K. Vohra, and Cheng-Chien Chen, showcases the power of random forests models to predict superhard materials from chemical formulas. The study reveals that a 1:1 B-N ratio can lead to several dynamically stable superhard compounds, pushing the boundaries of materials science through innovative machine learning techniques.
~3 min • Beginner • English
Introduction
Superhard materials (Vickers hardness H ≥ 40 GPa) are essential for applications such as abrasives, cutting tools, and protective coatings. Diamond is the hardest known material but has limitations (cost, size, chemical reactivity with Fe-group elements, and poor performance in oxidizing environments). Light-element compounds involving B, C, N, and O can form short covalent bonds conducive to superhardness; examples include c-BN and various boron carbides and B–C–N ternaries (e.g., BC₂N, BC₄N). However, the compositional/structural search space is vast, making exploration challenging. First-principles DFT can predict superhard compounds but is computationally intensive. Data-driven machine learning (ML) offers efficient large-scale screening, but many prior models rely on features like cohesive energy, volume, or crystal symmetry that are not available a priori for new compositions. The study aims to develop ML models that use only features derivable from chemical formula to predict bulk and shear moduli (and hence hardness), apply them to map the B–C–N compositional space, and validate promising candidates with evolutionary crystal structure prediction and DFT.
Literature Review
Prior work has demonstrated ML’s utility in materials discovery, including large-scale screening of ternaries for stability (Meredig et al.). For elastic property prediction, gradient boosting and SVM approaches have achieved high accuracy using features such as volume per atom, cohesive energy, and structural descriptors (e.g., de Jong et al., Mansouri et al.). However, reliance on structural/thermodynamic features limits applicability to unexplored compounds. Empirical hardness models (e.g., Tian’s model) relate hardness to bulk/shear moduli and provide a route to estimate H from K and G. Superhard materials in B–C–N–O systems (e.g., BC₂N, BC₄N, BC₅, B₂CO phases) have been synthesized or predicted, and evolutionary algorithms (USPEX) have been effective for structure prediction of hard phases. These works motivate a formula-only feature approach to efficiently explore ternary B–C–N superhard materials and validate via CSP and DFT.
Methodology
Data acquisition: Target properties (bulk modulus K and shear modulus G) were extracted from the Materials Project via the PYMATGEN API. Selection criteria included: removal of samples with formation energy ≥0.2 eV/atom; exclusion of layered/quasi-2D materials with |Voigt−Reuss| > 50 GPa; filtering by Pugh’s ratio k = G/K to remove k < 0.25 (very soft) and k > 4.0 (extreme high-hardness, often high-pressure phases). Additionally, outliers were removed by constraining K and G to 0–550 GPa. This yielded 10,421 samples. The dataset was split into 90% train/validation and 10% test sets.
Feature generation: 60 composition-derived features were constructed without structural or electronic inputs. These include: (1) Stoichiometric features: L^p norms of atomic fractions for p = 0 (number of components), 2, and 3. (2) Elemental property statistics: for each of 10 properties (atomic number, atomic mass, periodic table column and row, atomic radius, electronegativity, and counts of s, p, d, f valence electrons), five statistics were computed across constituent elements—minimum, maximum, range, fraction-weighted mean, and mean absolute deviation (total 50 features). (3) Orbital-occupation ratios: fraction-weighted average of s, p, d, f valence electrons divided by the total valence electrons (4 features). (4) Ionic features: Boolean feasibility of a neutral ionic compound (from common oxidation states), maximum bond ionic character, and mean ionic character based on Pauling electronegativity differences (3 features).
Model training and validation: Two random forests regressors (Scikit-learn) were trained separately for K and G using 100 estimators and pre-pruned to a maximum depth of 12 (selected via grid search with tenfold cross-validation) to balance bias-variance and avoid overfitting. Model performance was evaluated on the held-out test set using the Pearson correlation coefficient r. Hardness H was not directly regressed; instead, H was computed from predicted K and G using Tian’s empirical model.
Application to B–C–N: The trained models were applied to enumerated B–C–N compositions to generate ternary (triangular) maps of predicted K, G, and H. Candidates with H ≥ 40 GPa were identified for further study.
Crystal structure prediction (CSP): USPEX evolutionary algorithm was employed under 15 GPa external pressure to bias toward denser, harder phases. Initial single-formula-unit compositions with even valence electron counts (e.g., BC₃N, BC₅N, BC₆N, BC₃N₂, B₂C₃N, B₄C₅N₂, etc.) were explored. At least 1000 structures per composition were generated via random initialization and evolutionary operations (heredity, soft mutation, transmutation). This search identified a diamond-like sp³ structure for B₂C₃N.
First-principles calculations: VASP (PAW, PBE-GGA) with a 520 eV plane-wave cutoff, Γ-centered Monkhorst–Pack k-mesh 21×21×5 (≈0.02×2π Å⁻¹ resolution). Self-consistent and relaxation tolerances were 10⁻⁵ eV per cell and 10⁻³ eV/Å. Elastic constants C_ij were obtained via the stress–strain method and converted to Voigt–Reuss–Hill averaged K and G. Phonon dispersions were computed using PHONOPY with DFPT on 2×2×1 supercells. Candidate structures for BC₁₀N and B₄C₅N₃ were generated by BN and B substitutions in a 12-atom diamond supercell and checked via additional USPEX runs; the lowest-enthalpy structures share trigonal symmetry (hexagonal setting P3m1, No. 156). Mechanical stability was verified against Born criteria. Electronic band structures and phonons were analyzed to confirm dynamic stability and characterize electronic character.
Key Findings
- Dataset characteristics: 10,421 compounds after filtering; median K_DFT ≈ 84 GPa and G_DFT ≈ 40 GPa.
- Model performance (test set): Bulk modulus r = 0.940 (R² ≈ 0.885); shear modulus r = 0.907 (R² ≈ 0.822). Hardness predicted via Tian’s model from RF outputs shows r ≈ 0.789 relative to DFT-derived H.
- Feature importance/correlations: Atomic radius and d-electron occupation are most informative. Average atomic radius vs. bulk modulus r ≈ −0.24; largest atomic radius vs. bulk modulus r ≈ −0.41 (smaller cells favor higher K). d-electron occupation correlates positively with bulk modulus (r ≈ +0.58), consistent with high incompressibility of many transition-metal borides.
- B–C–N ternary maps: Predicted maps show a broad superhard region along compositions with B:N ≈ 1:1. Example ML hardness predictions: BC₂N H_RF ≈ 74 GPa; BC₄N H_RF ≈ 65 GPa, consistent with reported superhardness (BC₂N ≈ 62–76 GPa; BCN ≈ 68 GPa). Pure-carbon vertex corresponds to diamond-like H_RF ≈ 90 GPa; pure-boron region ≈ 30 GPa. An artificial high-H region near nitrogen-rich corner arises from large k = G/K in low-K phases when using Tian’s model.
- Proposed superhard compositions and properties (DFT, P3m1):
• BC₁₀N: Dynamically and mechanically stable; wide band gap ≈ 3.5 eV (insulator). DFT elastic properties: K = 417 GPa, G = 487 GPa, E = 1052 GPa, k = 1.166, ν = 0.080, A^U = 0.058. H_DFT ≈ 87 GPa; formation energy ΔE ≈ 79.5 meV/atom. RF predicted K_RF = 379 GPa, G_RF = 422 GPa, H_RF ≈ 75 GPa.
• B₄C₅N₃: Dynamically stable; metallic. K = 370 GPa, G = 368 GPa, E = 829 GPa, k = 0.995, ν = 0.125, A^U = 0.221. H_DFT ≈ 60 GPa; ΔE ≈ 141.3 meV/atom. RF: K_RF ≈ 359 GPa, G_RF ≈ 369 GPa, H_RF ≈ 62 GPa.
• B₂C₃N: Dynamically stable; metallic; hexagonal superlattice along diamond (111). K = 354 GPa, G = 298 GPa, E = 697 GPa, k = 0.840, ν = 0.172, A^U = 0.792. H_DFT ≈ 43 GPa; ΔE ≈ 155.9 meV/atom.
- All three have negative Cauchy pressures (C12 − C44 < 0: −405, −209, −121 GPa for BC₁₀N, B₄C₅N₃, B₂C₃N), indicating strong covalent bonding and brittleness; Pugh’s ratios > 0.571 further indicate brittleness. Increasing B/N content softens phonons and reduces K, G, and H. Density correlates positively with mechanical strength, while Poisson’s ratio and universal elastic anisotropy index correlate negatively with stiffness/hardness.
- Synthesis implication: Although ΔE > 0 for all, BC₁₀N has the lowest formation energy among the proposed compounds and lower than BC₂N/BCN, suggesting potential synthesis without extreme P–T conditions (e.g., low-temperature plasma methods).
Discussion
Using only composition-derived features, the random forests models accurately predict bulk and shear moduli, enabling rapid mapping of the B–C–N compositional space and identifying superhard regions without requiring structural information. The approach directly addresses the challenge of large design space by providing an efficient pre-screen for superhard candidates. The ternary maps reveal that compositions with B:N ≈ 1:1 tend to maximize predicted hardness within B–C–N, consistent with prior experimental observations for BC₂N/BCN and guiding targeted exploration. Evolutionary CSP and DFT validations confirm that several ML-suggested compositions (BC₁₀N, B₄C₅N₃, B₂C₃N) are mechanically and dynamically stable with superhard H ≥ 40 GPa and that BC₁₀N, in particular, achieves diamond-like hardness while being an insulator with a sizable band gap. The negative Cauchy pressure and high Pugh’s ratios corroborate strong directional covalent bonding underlying hardness, while trends in phonon softening and mechanical moduli with increased B/N content rationalize the observed decrease in hardness away from C-rich compositions. Collectively, the ML→CSP→DFT pipeline demonstrates an effective framework for compositional discovery of superhard materials and provides concrete candidates with feasible synthesis routes (e.g., low-temperature plasma) for experimental verification.
Conclusion
The study introduces composition-only random forests models that accurately predict elastic moduli and, via Tian’s relation, hardness, enabling large-scale, structure-agnostic discovery of superhard materials. Applied to the B–C–N system, the models identify a superhard region near B:N ≈ 1:1. Three compositions—BC₁₀N (insulating, H ≈ 87 GPa), B₄C₅N₃ (metallic, H ≈ 60 GPa), and B₂C₃N (metallic, H ≈ 43 GPa)—were validated by CSP and DFT to be mechanically and dynamically stable. BC₁₀N has relatively low formation energy, suggesting potential synthesis under non-extreme conditions such as low-temperature plasma, with broad application prospects in extreme environments. Future work could focus on experimental synthesis and characterization of these phases, extension of the composition-only ML framework to other light-element systems (e.g., B–N–O, B–C–O), incorporation of uncertainty quantification, and exploration of additional properties (e.g., fracture toughness, thermal stability, potential superconductivity in metallic B₄C₅N₃ and B₂C₃N).
Limitations
- Features exclude structural and thermodynamic descriptors (e.g., volume, crystal symmetry, cohesive energy) that could improve accuracy; thus, predictions may be less precise for classes where such features are critical.
- Hardness is not directly learned but inferred from predicted K and G using empirical models (Tian), whose applicability may be limited in low-K/G regions and can produce artifacts (e.g., high predicted H near nitrogen-rich corner due to large k = G/K).
- Data filtering excludes layered/quasi-2D materials and extreme high-pressure phases, limiting generalizability to those regimes.
- CSP searches focused on specific unit cell sizes and even-electron counts; some substitution routes (e.g., nitrogen substitution) were avoided due to thermodynamic instability, potentially missing viable metastable structures.
- Formation energies for proposed phases are positive (metastable with respect to elements/compounds used as references), implying synthesis may depend on kinetics and processing routes; DFT (PBE) approximations may affect quantitative energetics and band gaps.
- The nitrogen-corner artifact on ternary maps indicates sensitivity to data selection thresholds (e.g., requiring K, G > 50 GPa) that trade off between avoiding artifacts and biasing overall mechanical property estimates.
Related Publications
Explore these studies to deepen your understanding of the subject.