logo
ResearchBunny Logo
Deep learning for three-dimensional segmentation of electron microscopy images of complex ceramic materials

Engineering and Technology

Deep learning for three-dimensional segmentation of electron microscopy images of complex ceramic materials

Y. Hirabayashi, H. Iga, et al.

This groundbreaking study by Yu Hirabayashi, Haruka Iga, Hiroki Ogawa, Shinnosuke Tokuta, Yusuke Shimada, and Akiyasu Yamamoto demonstrates the power of neural networks in recognizing intricate microstructures in polycrystalline ceramics, achieving an impressive IoU of 94.6%. Their U-Net model reconstructs giga-scale 3D images in minutes, showcasing the future of high-resolution material analysis.

00:00
00:00
~3 min • Beginner • English
Introduction
Controlling microstructures is essential for functional materials performance, and recent advances enable acquisition of 3D structural information (optical microscopy, X-ray CT, FIB-SEM serial sectioning, TEM tomography). These produce 3D voxel data that capture connectivity, shape, and surface topography, but the data volumes are large and require objective, automated analysis. Semantic segmentation is key for pixel-wise phase identification, yet for electron microscopy of polycrystalline ceramics it has largely relied on manual, expert labeling due to weak contrasts and imaging artifacts. Classical thresholding methods are often inadequate. Deep learning models such as FCN and U-Net have advanced segmentation in other domains. This study targets polycrystalline iron-based superconducting ceramics, applying neural-network semantic segmentation to 3D FIB-SEM secondary electron images, benchmarking against Otsu and Sauvola thresholding, and demonstrating rapid giga-scale 3D microstructure reconstruction with high voxel resolution.
Literature Review
Semantic segmentation approaches span classical computer vision (e.g., global Otsu thresholding and local adaptive Sauvola thresholding) and modern machine learning (FCN, U-Net, DeepLab). Thresholding can be effective when histogram peaks correspond to phase contrasts and is widely used in materials domains (superconductors, batteries, thermoelectrics, nanoporous materials, geomaterials, superalloys). Deep learning–based segmentation has achieved strong results in image recognition and medical imaging, with FCN introducing end-to-end learning and skip connections refined in U-Net and DeepLab. In materials, much recent DL segmentation focuses on X-ray CT due to ease of 3D acquisition and transparency; fewer studies tackle electron-microscopy-based 3D microstructures despite higher resolution and applicability to light elements. Prior works in steel microstructure segmentation and 3D CNNs for tomography demonstrate the promise of DL for complex microstructures, motivating adaptation to EM images of ceramics with voids, secondary phases, and artifacts.
Methodology
Sample preparation: Polycrystalline Ba(Fe,Co)2As2 (Ba122) superconductor was synthesized by high-energy planetary ball milling of elemental metals to BaFe1.84Co0.16As2 composition, followed by vacuum sintering at 600 °C for 48 h. All powder handling occurred in Ar glove box to minimize oxygen contamination. 3D-SEM imaging: Serial sectioning via FIB-SEM (Thermo Scientific Helios 600i) acquired 620 secondary electron images (5 kV, ET detector). The Ga ion and electron guns had a 52° angle; images were obtained by cutting the center of the sample from a 38° direction. Each image had 1536×1024 pixels; stacking with 20 nm z-pitch yielded voxel dimensions (x,y,z) ≈ (20.8 nm, 26.4 nm, 20 nm). A central 1100×924 region was selected for segmentation to exclude empty areas. Ground-truth creation: Training images were made by manual pixel-wise segmentation of secondary electron images into positive (superconducting phase) and negative (defects such as voids/impurities). Initial drafts were made by bucket-filling pre-classified eight-tone regions, followed by three rounds of visual inspection and correction by experienced graduate students. Ambiguous boundaries adjacent to voids were resolved using depth continuity from neighboring slices (leveraging the 3D stack) to improve depth-accuracy of labels. A separate manually segmented test image (1100×924) from a distinct z-position was prepared to avoid overfitting and overestimation. Automated dataset generation and augmentation: From paired original/label images, training datasets were created by random cropping to 256×256 and applying random rotations/flips to produce 1000 images per dataset; similarly, 1000-image test datasets were generated from the 1100×924 pair. Ten training and ten test datasets were produced via data expansion. Models: Four approaches were compared: Otsu global thresholding (OpenCV), Sauvola local adaptive thresholding (scikit-image with Gaussian pre-filter), and deep-learning FCN models (FCN-32s, FCN-16s, FCN-8s) and U-Net. FCNs upsample with skip connections at increasing resolutions (none/one/two respectively), while U-Net concatenates encoder features at all resolutions to decoder paths to retain spatial detail and learn fine structures. Training and implementation: Deep models were implemented in Python 3.8.8 with TensorFlow 2.4.1 and trained on an Nvidia Quadro RTX5000 (16 GB). Loss function: BCE + Dice loss. Learning rate followed a step decay schedule with initial_lr=0.001, decay factor y=0.5, step_size=20 epochs. Total training epochs: 120 (~2 h training time). Inference time: segmentation of 620 images at 768×768 took a few minutes, much faster than manual labeling (several days for 896×896 ~803k pixels). Evaluation: Performance was measured via confusion matrix–derived metrics: Precision = TP/(TP+FP), Recall = TP/(TP+FN), and Intersection over Union (IoU) = TP/(TP+FP+FN). Models were also evaluated by 3D reconstructions and z-slice positive-phase ratio statistics across 620 slices.
Key Findings
Quantitative performance (mean ± std over datasets): - Otsu: Precision 0.9417 ± 0.0005; Recall 0.6478 ± 0.0049; IoU 0.6136 ± 0.0047. - Sauvola: Precision 0.9316 ± 0.0004; Recall 0.9936 ± 0.0001; IoU 0.9246 ± 0.0004. - FCN-32s: Precision 0.8456 ± 0.0003; Recall 0.9524 ± 0.0005; IoU 0.8095 ± 0.0004. - FCN-16s: Precision 0.9147 ± 0.0002; Recall 0.9425 ± 0.0003; IoU 0.8642 ± 0.0002. - FCN-8s: Precision 0.9597 ± 0.0001; Recall 0.9574 ± 0.0001; IoU 0.9188 ± 0.0002. - U-Net: Precision 0.9751 ± 0.0002; Recall 0.9712 ± 0.0001; IoU 0.9464 ± 0.0002. U-Net achieved the highest overall IoU (94.6%), among the highest reported for complex polycrystalline ceramics with challenging contrast and voids. Otsu suffered from low recall due to salt-and-pepper noise misclassification. Sauvola had the best recall but lower precision due to FP in bright regions. 3D reconstruction: Deep models (FCNs, U-Net) and Sauvola produced smoother, z-continuous reconstructions than Otsu (which showed discontinuous artifacts). U-Net and FCN-8s captured detailed void structures; FCN-32s emphasized global, coarse features. The positive-phase fraction per slice across 620 layers showed smooth z-variation for Sauvola and deep models. Mean ± SD of positive-phase ratios: Otsu 0.6466 ± 0.0649; Sauvola 0.8272 ± 0.0111; FCN-32s 0.8743 ± 0.0104; FCN-16s 0.8009 ± 0.0137; FCN-8s 0.7720 ± 0.0151; U-Net 0.7728 ± 0.0143. Expert-labeled training/test images had positive-phase ratios of 74.2% and 79.7%; U-Net and FCN-8s predictions were within ~2% of these values. Efficiency: Inference segmented 620 images (768×768) within minutes, versus several days for manual labeling of a single 896×896 image. Voxel resolution was 20 nm (z), exceeding typical lab X-ray CT.
Discussion
Deep learning segmentation, especially U-Net, substantially improves robustness to electron-microscopy-specific artifacts (polishing marks, depth-induced edge brightness, background intensity gradients) and noise compared with classical thresholding. By leveraging local and multi-scale context via convolutions and skip connections, models maintain high accuracy regardless of global brightness shifts and identify fine voids and boundaries. Although U-Net attains state-of-the-art IoU for this ceramic system, expert performance is still higher in certain ambiguous regions. The method enables rapid, accurate 3D voxel reconstructions, facilitating quantitative analyses (e.g., 3D connectivity, internal surface area, curvature) that 2D approaches cannot reliably capture. Such 3D microstructure data directly impact understanding transport-related functionalities (e.g., superconducting critical current paths, thermal/electrical percolation, ionic conduction) where 3D phase connectivity and texture govern macroscopic properties. The approach supports integration of 3D-SEM datasets into simulations and process informatics, advancing towards digital twins that bridge experiments (including large-area, in situ, operando 3D observations) and multi-scale modeling.
Conclusion
The study demonstrates accurate, fast semantic segmentation of 3D FIB-SEM electron micrographs of complex polycrystalline ceramics using deep neural networks. U-Net achieved an IoU of 94.6% and, together with FCN-8s, enabled faithful giga-voxel 3D reconstructions at 20 nm voxel resolution within minutes, far outpacing manual segmentation. Curating training labels with depth-aware consistency (using neighboring slices) was crucial to performance. Results indicate that deep learning with datasets encoding depth information is essential for reliable 3D microstructure quantification in ceramics. Future work includes expanding and balancing training datasets (e.g., more impurity-phase examples), refining architectures to improve receptive-field balance and long-range context, extending to fully 3D CNNs, and integrating the resulting 3D microstructures into operando analyses, multi-physics simulations, and data-driven process optimization. Public release of datasets and code will further catalyze progress.
Limitations
Failure cases highlight current constraints: (1) Impurity phases with dark contrast were incompletely segmented, likely due to scarcity in training data (only six examples). (2) U-Net misidentified regions with few defects (over-segmentation of voids) attributed to a relatively narrow effective receptive field and dominant void-recognition filters. (3) A peaked superconducting phase within a void (submarine-ridge feature) caused misclassification by thresholding and U-Net; FCN-8s performed better by leveraging more global context. (4) Island-like superconducting regions surrounded by voids were not reliably identified by neural models, though thresholding sometimes succeeded; these cases are challenging even for human experts. Overall, segmentation remains inferior to expert annotation in some depth-ambiguous boundaries; model and dataset enhancements, including more diverse examples and potentially 3D context, are needed.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny