logo
ResearchBunny Logo
Unveiling the complex structure-property correlation of defects in 2D materials based on high throughput datasets

Engineering and Technology

Unveiling the complex structure-property correlation of defects in 2D materials based on high throughput datasets

P. Huang, R. Lukin, et al.

Discover the insights from the groundbreaking 2D Material Defect (2DMD) datasets, which unveil the defect properties of 2D materials through DFT calculations. This research, conducted by Pengru Huang, Ruslan Lukin, Maxim Faleev, Nikita Kazeev, Abdalaziz Rashid Al-Maeeni, Daria V. Andreeva, Andrey Ustyuzhanin, Alexander Tormasov, A. H. Castro Neto, and Kostya S. Novoselov, seeks to provide a data-driven understanding of defect behaviors to enhance machine learning models for materials design.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the lack of machine learning-friendly databases focused on defects in two-dimensional materials, despite the importance of defect engineering for tailoring mechanical, thermal, electronic, and optical properties. The authors present a new database and datasets (2DMD) for defects in 2D materials, aiming to enable data-driven understanding and machine learning applications. They highlight the opportunity offered by 2D materials to controllably modify properties via adatoms, substitutions, and vacancies, and the challenge posed by the vast combinatorial space of host materials, defect components, and configurations. The research purpose is to create structured and dispersive high-throughput DFT datasets that capture structure–property correlations of defects, enabling analysis (including ML) and providing guidelines for defect engineering, with an initial comprehensive analysis on MoS2.
Literature Review
The study situates itself within the materials genome and high-throughput computation ecosystem. Established computational databases include Materials Project, OQMD, AFLOW, and NOMAD. Graph neural network-based ML models such as MEGNet, CGCNN, SchNet, and GemNet have advanced property prediction for materials. Prior ML efforts on defects are relatively scarce and have focused on point-defect properties in 2D materials, defect migration and formation energies in alloys, and defect dynamics in 2D TMDCs. A recent related effort is the QPOD database (Bertoldo et al.) containing 503 defect structures across 82 2D materials with thermodynamic and electronic properties; however, its size and density are limited for efficient ML. The authors identify a need for larger, denser, and structurally designed defect datasets to improve ML performance and transferability.
Methodology
Dataset generation: The authors created two complementary dataset groups. A structured dataset enumerates symmetrically inequivalent single-, double-, and triple-site defects in 8×8 monolayer supercells, focusing on MoS2 with defect components comprising Mo vacancies, S vacancies, W substitutions (on Mo), and Se substitutions (on S); an analogous dataset was generated for WSe2 with Mo and S substitutions. This yields 5,933 configurations for MoS2 and 5,933 for WSe2. A dispersive dataset samples high-density defects by randomly generating combined vacancy and substitution defects at concentrations of 2.5%, 5%, 7.5%, 10%, and 12.5% for MoS2, WSe2, hBN, GaSe, InSe, and black phosphorus (100 structures per concentration per material; 500 per material; 3,000 total). All datasets include relaxed atomic structures, DOS, and band structures and are available online. DFT calculations: Density functional theory with the PBE GGA functional was used as implemented in VASP. The PAW method described the ion–electron interaction, with a plane-wave cutoff of 500 eV. Spin polarization was included. Large supercells enabled Γ-point-only Monkhorst–Pack sampling for structure relaxation and denser grids for electronic properties. A vacuum spacing of at least 15 Å avoided interlayer interactions. Structural relaxations proceeded until forces were below 0.01 eV/Å and energy convergence was 1e−5 eV. For systems with unpaired electrons, collinear spin-polarized calculations with high-spin ferromagnetic initialization were used. Spin–orbit coupling (SOC) and charged defect states were not included. Property definitions: Formation energy Ef = ED − Epristine − Σ ni μi, where ED and Epristine are total energies of defect and pristine structures, ni counts atoms exchanged with reservoirs, and μi are element chemical potentials. Interaction energy for complexes: Eint = ED − Σ Ei, with Ei formation energies of component single-site defects (negative indicates attraction). Defect electronic levels characterized via highest occupied (HOMO) and lowest unoccupied (LUMO) Kohn–Sham states, referenced to pristine VBM using deepest orbital energy alignment; values extracted at Γ due to localized, flat defect bands. The authors note PBE band-gap underestimation but argue trends in Ef, HOMO, and LUMO are transferable. Modeling insights: A simplified two-orbital picture was used to rationalize oscillatory interactions between defect states (bonding/antibonding formation via overlap S, and direct Coulomb K and exchange J integrals) and their dependence on lattice symmetry and sublattices in honeycomb-like TMDCs.
Key Findings
- Dataset scale and composition: A structured dataset of 11,866 configurations (5,933 MoS2 + 5,933 WSe2) and a dispersive dataset of 3,000 configurations across MoS2, WSe2, hBN, GaSe, InSe, and black phosphorus (BP). - Formation energy ranges reflect interaction amplitudes and defect severity: V3-type defects (one Mo vacancy plus two S vacancies) span ~4.121 eV; complexes with one Mo vacancy and one S vacancy vary ~2.282 eV; two S-vacancy combinations vary ~0.1 eV; substitution-dominated defects vary by only a few tens of meV. - Absolute formation energies (MoS2): Mo vacancy ≈ 7.12 eV; S vacancy ≈ 2.65 eV; W→Mo substitution ≈ 0.167 eV; Se→S substitution ≈ 0.279 eV. - Electronic levels: Vacancy defects create deep levels 0.1–0.4 eV inside the gap; substitutional defects introduce states within bands without significantly affecting band edges (changes on the order of 10 meV). - Binary property map (band gap vs formation energy): Across the dataset, band gap decreases as formation energy increases, converging near ~0.3 eV for high Ef. Substitutions preserve the pristine MoS2 band gap (~1.81 eV), single S vacancies yield gaps ~1.1 eV with Ef ~3.0 eV, and double S vacancies have Ef ~5.5 eV with band gaps spanning ~0.6 eV depending on vacancy separation. Deep gap states typically require Mo vacancies; defects involving Mo sites have band gaps mainly <0.5 eV and Ef spanning ~7–13 eV. - Fingerprints across materials: Property maps for high-density defects retain nontrivial trends and serve as material-specific fingerprints. MoS2 and WSe2 show similar features; GaSe and InSe share similarities; distinct differences across different hosts. - Magnetism: No magnetic defects were found in MoS2 and WSe2 datasets; magnetic defects appear in GaSe, InSe, BP, and C-doped hBN. Exchange splitting leads to asymmetric band-gap distributions, with a trend of larger majority-spin gap distribution, especially in BP and C-doped hBN. Magnetic moments in C-doped hBN follow Lieb’s theorem S = [NC(N) − NC(B)]/2 and can be large with sublattice imbalance. - Quantum oscillations in defect interactions: For V2 defects (Mo vacancy + S vacancy), interaction energy, HOMO, and LUMO oscillate with vacancy separation, with pronounced minima at 1st, 3rd, 6th, and 10th nearest S sites (triangular numbers), aligned along zigzag directions. Wavefunction resonance and sublattice structure control hybridization strength, explaining stabilization energy fluctuations and corresponding shifts in defect levels. - Practical guidance: To introduce shallow states, create single/double S vacancies; to generate deep levels, include Mo vacancies; substitutions (W, Se) minimally perturb electronic structure.
Discussion
The findings demonstrate that carefully structured high-throughput DFT datasets can reveal robust structure–property correlations for defects in 2D materials. The observed trend of decreasing band gap with increasing formation energy provides a practical guideline for defect engineering in TMDCs, enabling targeted introduction of shallow or deep states depending on desired functionality. The hierarchical influence of defect type (substitution vs S vacancy vs Mo vacancy) clarifies how strongly the lattice and electronic structure are perturbed. The persistence of nontrivial features in property maps across concentrations suggests these distributions can act as fingerprints for different 2D hosts, supporting transferability and comparison across materials. The oscillatory interaction energies and defect level shifts as functions of defect separation and lattice direction are explained via a simplified two-orbital quantum model, connecting lattice symmetry, wavefunction overlap, and exchange interactions to stabilization energies. Collectively, these insights address the initial goal of enabling machine learning-ready datasets with physical interpretability, offering both training data and physics-based understanding to inform model design and defect engineering strategies.
Conclusion
The authors introduce the 2DMD datasets—machine learning-friendly, high-throughput DFT datasets of defects in representative 2D materials—and demonstrate how a structured approach uncovers nontrivial structure–property correlations. They produced 11,866 configurations for TMDCs (MoS2, WSe2) covering single, double, and triple defects, and 3,000 high-density configurations across six 2D materials. Property maps reveal a general trend of band gap reduction with increasing formation energy and material-specific fingerprints. Quantum oscillations in defect interactions are rationalized by a simple two-orbital model. These datasets and insights provide actionable guidance for defect engineering (e.g., using S vacancies for shallow states and Mo vacancies for deep levels) and a foundation for developing efficient, interpretable ML models. Future work includes enrolling additional datasets across more materials and defect types to expand coverage and enable scalable, transferable prediction and design of materials with predetermined properties.
Limitations
- Electronic structure methodology: Use of PBE GGA underestimates band gaps; while trends are expected to transfer, quantitative level alignment would benefit from hybrid functionals or many-body methods (at higher computational cost). - Physics included: Spin–orbit coupling (SOC) and charged defect states were not included, which can affect defect levels and magnetic properties in some 2D materials. - Data coverage: Despite high throughput, the defect space is vast; practical datasets remain “small data” relative to the combinatorial space. The dispersive sampling strategy leads to sparsity in some regions. - Supercell and k-point limitations: Finite-size effects and Γ-point sampling for relaxations may influence subtle features despite large supercells.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny