Engineering and Technology

Machine learning enabled autonomous microstructural characterization in 3D samples

H. Chan, M. Cherukara, et al.

This groundbreaking research, conducted by Henry Chan, Mathew Cherukara, Troy D. Loeffler, Badri Narayanan, and Subramanian K. R. S. Sankaranarayanan, unveils a novel unsupervised machine learning technique that identifies and characterizes microstructures in 3D samples. With striking efficiency and accuracy, it tackles complex microstructural features affecting material behavior without needing prior descriptions.... show more

Introduction

Characterization of microstructural and nanoscale features in full 3D samples is vital across technologies, as microstructural features (e.g., grain size distributions in metals, voids/porosity in polymers, hierarchical assemblies in soft matter) strongly correlate with material properties. Conventional practice largely relies on 2D characterization and inference of 3D information from slices, which is inefficient and lossy. Direct, robust 3D classification for arbitrary polycrystalline microstructures is therefore highly desirable, especially with advances in 3D and 4D imaging (tomography, HEDM, coherent diffraction). Polycrystalline materials consist of many grains with nearly identical lattice structure but different orientations; grain boundaries separate grains, and average grain size/distribution critically influences mechanical, optical, chemical, and thermal properties (e.g., Hall–Petch relation). Prior studies show grain size distribution breadth affects strength, motivating accurate microstructural quantification for property prediction and materials design. Existing approaches and standards (ASTM) for 2D grain identification can be accurate but are sensitive to intersection criteria, non-uniform distributions, and often require tedious, subjective measurements; automation faces variability in imaging contrast/etching, though EBSD helps reduce subjectivity. Automated 2D methods include supervised CNNs and unsupervised clustering/Voronoi-based techniques; supervised methods are accurate but data/system-specific, while unsupervised methods can perform well with a priori information and tuned hyperparameters but inherit specificity; density-only unsupervised methods are general but less accurate. Extending 2D methods to 3D via slice stacks is non-trivial, can be orientation- and sampling-dependent, and time-consuming. With brighter, coherent X-ray sources enabling 4D imaging, rapid accurate segmentation is crucial for real-time characterization. The authors present a method combining topology classification, image processing, and unsupervised ML to rapidly characterize 3D microstructures (grains, voids, micelles) across inorganic and soft materials, insensitive to extended defects and suitable for real-time facility data.

Literature Review

The paper reviews standards and prior methods for microstructure characterization, particularly grain size analysis: ASTM 2D methods (matching, planimetric, intercept) can reach ±0.25 grain size units accuracy and reproducibility but are sensitive to intersection criteria and non-uniform distributions, and often require manual, subjective measurements; EBSD has been proposed to reduce subjectivity in boundary identification. Automated 2D techniques include supervised CNN-based segmentation (high accuracy after training but dataset/material-specific) and unsupervised approaches (histogram thresholding, watershed, k-means, Voronoi) that can perform comparably when provided a priori information (e.g., number of grains, orientations) and tuned hyperparameters, but then also become system-specific. Methods relying solely on local density generalize better across materials/techniques at some cost to accuracy. Extending these 2D approaches to 3D via stacks is affected by slice number and orientation and is time-consuming. For 3D/4D imaging modalities (DCT, Laue, HEDM), segmentation remains difficult when contrast is faint, necessitating clustering, deformable models, or gradient-based methods with varying success. The literature thus indicates a need for fast, general, reliable, and accurate 3D grain and inclusion identification suitable for real-time analysis across materials and imaging modalities.

Methodology

The approach integrates three main processes: (1) Preconditioning and topology-based classification; (2) Unsupervised ML clustering; (3) Refinement and back-mapping. Process 1: Preconditioning and topology classification

Local structure identification distinguishes microstructures (e.g., grains) from boundaries. For atomistic polycrystals: use common neighbor analysis (CNA) for fcc, bcc, hcp and extended CNA for diamond/ice (hexagonal/cubic) to assign local structure labels. Amorphous/unlabeled atoms are excluded from grain analysis.
For soft materials, labeling can leverage atom types, bond topology, local charges, or chemical identity.
Voxelization: Convert labeled atoms/beads to a voxel grid based on number densities to enable efficient image-processing; apply image filters (uniform blur, local variance) and thresholding to enhance microstructure vs boundary separation. Process 2: Unsupervised ML clustering
Cluster voxels with similar local structure labels into individual microstructures; outputs include number of clusters (e.g., grains) and their volumes (size distribution). Assign unique cluster labels for visualization.
Choice of clustering algorithm depends on prior knowledge; density-based DBSCAN is used broadly due to unknown cluster counts and irregular shapes. DBSCAN hyperparameters: neighborhood cutoff ε limited to 1st-nearest voxels; start with strict Nmin (27 in 3D, 9 in 2D) and relax until total cluster count is maximized. Process 3: Refinement and back-mapping
Refinement via label propagation/spreading: assign unlabeled voxels near boundaries to neighboring cluster labels of maximum occurrence, prioritizing voxels near smaller microstructures to improve size estimates.
Back-mapping: For atomistic systems, assign atoms the cluster label of their containing voxel to recover atomistic representations. Case-specific preconditioning examples
Metals (Al fcc, Fe bcc, Si diamond, Ti hcp): CNA/extended CNA; voxel bin sizes (approx.): 4.5 Å (Al), 4.1 Å (Fe), 4.0 Å (Si), 4.4 Å (Ti); 40th-percentile thresholding of non-zero voxels to exclude boundary voxels prior to clustering.
Polycrystalline ice (CG water): extended CNA for hexagonal/cubic/stacking-disordered phases; voxel bin 5 Å; voxelization on number densities of cubic and hexagonal beads; per-frame analysis on trajectory.
Polymers (polysiloxane, polyethylene): voxelization based on atom number density; bin size 3 Å to sample voids; clustering to identify void spaces and compute volume distributions.
Reverse micelles (CG): voxelization on water bead number densities; due to 4:1 CG ratio, bin size 8 Å; cluster water within micelles to obtain size distribution.
Experimental superalloy (IN100) 3D images (serial-section EBSD reconstruction): more noise/artifacts; use local variance filter to detect grain boundaries in both bright-field and dark-field images; threshold then cluster and refine. Down-sampling (0.5×, 0.25× resolution) accelerates processing with reduced small-feature sensitivity. Error sensitivity and efficiency considerations
Robustness tested by randomly perturbing local structure labels (noise) and varying voxel bin sizes; down-sampling and local variance/uniform filtering improve resilience. Threshold choice remains sensitive; empirically, ~90th-percentile thresholding of non-zero voxels works well for simulation data with local variance filtering, and ~40th-percentile for experimental images.
Computational complexity: Voxelization O(n); DBSCAN clustering in 3D O(n log n) with k-d tree for neighbor search (build O(n log n); query O(log n)); refinement uses the same k-d tree to assign boundary voxels. Voxelization typically reduces data to ~25% of original size prior to clustering, yielding efficiency gains. Data and sample preparation (from Methods)
Synthetic polycrystals: Voronoi tessellation; ~20 nm cubes (~500k atoms), 300 grains, periodic boundaries; benchmarking distributions exclude boundary atoms via CNA/extended CNA.
Ice: CG MD LAMMPS trajectories, ~40 nm cube, ~2 million water molecules, up to 1.2 µs (0.1 ns frames).
Polymers: Atomistic fixed-bond models; polysiloxane (~17k atoms, ~5×6×6 nm) and polyethylene (~33k atoms, ~8×9×8 nm); equilibrated up to 200 ns in LAMMPS using class2 potentials (COMPASS/PCFF) under NPT at 300 K.
Micelles: CG MARTINI in NAMD; ~82×82×90 nm; 125k water, 1.5M dodecane, 120.4k surfactant-like molecules; equilibrated 200 ns at 300 K, 1 bar.

Key Findings

General applicability: The unsupervised ML pipeline (topology classification + voxelization + DBSCAN + refinement) robustly identifies and characterizes diverse 3D microstructures (metal grains, polymer voids, micelles) without a priori microstructure descriptions and is insensitive to extended defects.
Metals (benchmarking on synthetic 300-grain samples, ~20 nm cubes, ~500k atoms): Achieved >94% accuracy in predicting the total number of grains across fcc (Al), bcc (Fe), diamond (Si), and hcp (Ti) systems; accurately identified grains larger than ~200 atoms. Voxel bin sizes: ~4.0–4.5 Å; 40th-percentile thresholding of non-zero voxels used in these tests.
Ice trajectory (CG water, ~2 million molecules): Efficient in-situ, frame-by-frame grain analysis across >1 µs trajectory with consistent size-based coloring of grains; extended CNA distinguishes hexagonal/cubic/stacking-disordered phases; voxel bin 5 Å.
Polymers (polysiloxane, polyethylene): Identified both large and small voids; produced quantitative void volume distributions (e.g., polyethylene sample yielded 95 voids across ~2.5–10 nm³). Bin size 3 Å.
Reverse micelles (CG): Extracted micellar size distribution via clustered water bead counts within micelles; example distribution included 1544 micelles. Bin size 8 Å for water density voxelization.
Experimental superalloy (IN100): Local variance filtering effectively delineated grain boundaries in noisy 3D EBSD-reconstructed images, enabling grain segmentation and size distribution estimation. Down-sampling yielded substantial speedups (0.5× resolution → ~7×; 0.25× → ~29×) at the cost of small-feature detection.
Robustness to noise and hyperparameters: With local variance filtering, method tolerates up to ~25% mislabeling in local structure at larger voxel sizes (e.g., 5.5 Å bin) and ~15% at smaller bins (4.5 Å), reflecting resilience due to down-sampling and density-based clustering. Trade-off: larger bins improve robustness and speed but lose fine detail.
Computational efficiency: Voxelization reduces data size (~25% of atoms/beads), clustering is O(n log n) with k-d tree acceleration; suitable for real-time or in-situ analyses at large-scale facilities.

Discussion

The method addresses the need for fast, general, and accurate 3D microstructural characterization by combining topology-aware labeling with voxel-based preconditioning and density-based clustering. It overcomes limitations of 2D approaches and slice-stacking by operating directly on 3D data and is robust to extended defects (e.g., stacking faults, semi-amorphous domains) that can hinder standard classifiers. Applications to metals, polymers, micelles, and experimental superalloys demonstrate generality across inorganic and soft matter systems and across simulated and experimental data. Error sensitivity analysis shows resilience to noisy labels and variable voxelization settings due to data averaging (voxelization, filtering) and DBSCAN’s noise handling. However, there is a fundamental trade-off between robustness/speed and resolution: larger voxel bins and down-sampling accelerate processing and improve noise tolerance but reduce sensitivity to small features and fine structure within distributions. Thresholding remains a sensitive step; empirically chosen percentiles work well, but automated, data-driven threshold selection could further stabilize performance. For time-resolved analyses (e.g., grain evolution in ice), frame-wise clustering yields consistent size-based visualization, but lack of inter-frame label continuity complicates tracking individual grains; incorporating spatial/orientational correlation across frames would enable robust temporal tracking. Overall, the approach enables near-real-time analysis of large datasets from synchrotron and electron-based facilities and MD trajectories, providing unbiased microstructural statistics critical for linking structure to properties.

Conclusion

The work introduces an unsupervised ML pipeline for autonomous 3D microstructural characterization that integrates topology classification, voxel-based preconditioning, density-based clustering (DBSCAN), and refinement/back-mapping. It achieves high accuracy on synthetic polycrystals, robustly quantifies voids and micelles in soft materials, and effectively segments experimental superalloy microstructures, while being computationally efficient and broadly applicable. The approach is resilient to noise and parameter variations, and its efficiency makes it suitable for real-time facility data streams and in-situ MD analysis. Future directions include: (i) incorporating inter-frame correlation (spatial proximity, lattice orientation) to enable consistent grain tracking over time; (ii) automated optimization of hyperparameters (e.g., ε, Nmin) and threshold cutoffs via data-driven schemes (e.g., Otsu-like methods) to reduce sensitivity; (iii) exploring multi-resolution strategies to recover fine features while preserving speed; and (iv) extending to other microstructural features (inclusions, precipitates) and imaging modalities with domain-adapted preconditioning.

Limitations

Thresholding sensitivity: Although percentile-based thresholds work well empirically (e.g., ~90th-percentile with local variance for simulations; ~40th-percentile for experimental images), results can vary with cutoff selection.
Resolution trade-off: Larger voxel bins and down-sampling improve robustness and speed but reduce the ability to detect small grains or fine features in distributions.
Labeling noise and classifiers: Reliance on local structure classifiers (CNA/extended CNA) means mislabeling or amorphous regions can affect boundary detection; robustness is finite (~15–25% mislabeling tolerance depending on bin size).
Temporal tracking: Frame-by-frame analysis lacks inter-frame label continuity, complicating tracking of individual grains over time without additional correlation steps.
Hyperparameter tuning: DBSCAN parameters (ε, Nmin) are adjusted heuristically (maximize cluster count), which may not always be optimal across datasets.
Experimental artifacts: Noise and imaging artifacts in tomography/EBSD reconstructions necessitate careful preconditioning (e.g., local variance filtering) and may still impact segmentation quality.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Tracking historical changes in perceived trustworthiness in Western Europe using machine learning analyses of facial cues in paintings

L. Safra, C. Chevallier, et al.

Medicine and Health

Network-based machine learning in colorectal and bladder organoid models predicts anti-cancer drug efficacy in patients

J. Kong, H. Lee, et al.

Engineering and Technology

In-sensor human gait analysis with machine learning in a wearable microfabricated accelerometer

G. Dion, A. Tessier-poirier, et al.

Medicine and Health

Recent Advancements and Perspectives in the Diagnosis of Skin Diseases Using Machine Learning and Deep Learning: A Review

J. Zhang, F. Zhong, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny