Engineering and Technology
Super-resolving microscopy images of Li-ion electrodes for fine-feature quantification using generative adversarial networks
O. Furat, D. P. Finegan, et al.
This innovative research by Orkun Furat, Donal P. Finegan, Zhenzhen Yang, Tom Kirstein, Kandler Smith, and Volker Schmidt reveals the potential of SRGANs in enhancing the resolution of SEM images of cracked Li-ion battery cathodes. By effectively balancing volume and resolution, this study demonstrates how GANs can significantly improve crack detection, paving the way for better quantitative analysis in microscopy.
~3 min • Beginner • English
Introduction
The study addresses the challenge in materials characterization of balancing field of view with resolution in microscopy, particularly for heterogeneous Li-ion battery electrodes where fine features such as sub-micron cracks (<500 nm) must be quantified across representative volumes. High-resolution imaging captures detail but over small areas, risking non-representativity, while low-resolution imaging covers larger areas but misses fine features. The research question is whether generative adversarial networks, specifically SRGANs, can reliably super-resolve experimentally measured low-resolution SEM images of NMC cathode particles to recover fine features and thereby improve quantitative analysis (e.g., crack detection and characterization), mitigating the resolution–field-of-view trade-off. The work also examines performance when paired (registered) LR–HR images are available and when only unpaired datasets exist.
Literature Review
The paper situates the work within rapid advances of machine learning for computer vision and its increasing application to materials science, including classification, segmentation (e.g., U-Nets), and image synthesis (GANs). Prior studies demonstrated GANs for image synthesis and supervised/unsupervised super-resolution, including applications to SEM and EBSD data. However, networks trained on synthetically downsampled images can perform poorly on experimentally measured low-resolution images due to mismatched noise and acquisition characteristics. Unsupervised approaches such as CycleGAN derivatives have been proposed to bridge this gap. Comprehensive surveys of super-resolution methods exist (e.g., Wang et al., 2021). In materials science, prior works used GANs or CNNs to super-resolve microscopy images of nanoparticles and EBSD maps, generally training on downsampled HR data. The paper builds on SRGAN (Ledig et al., 2017) and considers alternatives (U-Net-based GANs, SRResNet variants, and CinCGAN) with focus on realistic SEM data from Li-ion cathode materials.
Methodology
Data and task: SEM images of differently aged LiNi_xMn_yCo_zO_2 (NMC) cathode particles exhibiting cracks. The super-resolution factor α = 2.5 between LR and HR images. The HR images were denoised prior to training; outputs are single-channel grayscale in [0,1]. A dataset of 46 HR and 102 LR SEM images was available; 33 registered LR–HR pairs were used. The 33 pairs were split into 24 training, 5 validation, and 4 test pairs.
Network architectures: The primary model is a modified SRGAN. The generator G is an SRResNet with 16 residual blocks, using ReLU activations (instead of PReLU) and omitting BatchNormalization to improve SR accuracy and accommodate small batch sizes. To achieve α = 2.5 upscaling, the input is first upsampled by 1.25, followed by a single PixelShuffle layer before the output. The final layer uses a sigmoid activation and outputs a single-channel image. The discriminator D is a modified Ledig et al. architecture without BatchNormalization.
Losses and optimization: Training minimizes a perceptual loss based on VGG features (Φ from VGG-19, using features before the 2nd max-pooling layer) combined with an adversarial loss; the minimax objective includes adversarial weight γ = 2.0. For paired training, random 96×96 LR cutouts and corresponding 240×240 HR cutouts are used per step. Training uses Adam with learning rate 1e-4, batch size effectively 32 via gradient accumulation, alternating G and D updates. Early stopping is applied based on validation L_{2,2,19} (perceptual) loss evaluated every 20 steps on 92 fixed validation cutout pairs.
Baselines and variants: Four additional networks were trained for comparison:
- U-NetGAN (based on de Haan et al., 2019): GAN with modified U-Net generator, trained with its original losses (including L1 and anisotropic total variation).
- SRResNet1: SRResNet generator only (no discriminator) trained with mean absolute error; architecture matches the SRGAN generator adapted to 2.5× upscale (1.25 upsample + PixelShuffle).
- SRResNet2: Supervised on synthetically downsampled HR data (no real LR), trained with mean absolute error as in Jung et al. (2021).
- CinCGAN (Yuan et al., 2018): An unsupervised cycle-in-cycle GAN for scenarios without paired data, consisting of a denoising GAN mapping real LR to synthetic LR-like downsampled HR, followed by a super-resolution GAN. The SR module was replaced with the SRResNet generator adapted for 2.5× upscale.
Training infrastructure: Implemented in TensorFlow, trained in under 10 hours on a single GPU (NVIDIA GeForce RTX 3060; system RAM 32 GB; AMD Ryzen 5 3600 CPU).
Evaluation metrics: On the test set (4 LR–HR pairs), predictions were compared to HR ground truth using MSE, VGG perceptual losses PL_{2,2,16} and PL_{2,2,19}, and mean structural similarity (MSSIM). For downstream analysis, crack segmentation was performed (modified Westhoff et al., 2018) on HR, bilinearly upsampled+denoised LR, and each super-resolved output to compute: (i) Jaccard index between predicted crack sets and HR crack set; (ii) relative error in specific crack density ρ; and (iii) L1 distance between fitted log-normal crack size distributions derived from connected-component area-equivalent diameters (excluding diameters < 50 nm).
Key Findings
- Quantitative super-resolution performance (test set averages):
- SRGAN achieved the best scores among all models: MSE = 5.67e-03 (lowest), PL_{2,2,19} = 2.21e+01 (lowest), PL_{2,2,16} = 5.92e+01 (lowest), MSSIM = 0.888 (highest).
- Competing models: U-NetGAN (MSE 5.70e-03, MSSIM 0.885), SRResNet1 (MSE 5.91e-03, MSSIM 0.880), SRResNet2 (MSE 6.03e-03, MSSIM 0.773), CinCGAN (MSE 5.70e-03, MSSIM 0.886). SRResNet2 underperformed notably, especially on noisy data.
- Crack segmentation and characterization improvements over upsampling:
- Jaccard index for crack set vs HR: SRGAN 0.679 vs bilinear upsampling 0.556; other models: U-NetGAN 0.575, SRResNet1 0.615, SRResNet2 0.580, CinCGAN 0.577.
- Specific crack density ρ (HR ground truth ρ = 0.123): relative error using SRGAN 0.036 vs bilinear 0.136; others: U-NetGAN 0.197, SRResNet1 0.072, SRResNet2 0.144, CinCGAN 0.068.
- Crack size distribution discrepancy ||f − f̃||: SRGAN 0.053 vs bilinear 0.23; others: U-NetGAN 0.074, SRResNet1 0.216, SRResNet2 0.614, CinCGAN 0.23.
- Training on real LR data is critical: SRResNet2 (trained only on synthetic LR from downsampled HR) performed worse on real LR, especially with noise. CinCGAN, which incorporates real LR without pairing, outperformed SRResNet2 and approached U-NetGAN, but still lagged SRGAN trained on paired data.
- Visual assessments corroborated quantitative metrics: SRGAN preserved fine crack morphology more faithfully, supporting improved crack detection and size estimation.
Discussion
The findings demonstrate that SRGANs trained with perceptual loss on paired real LR–HR SEM data can effectively super-resolve images of NMC cathode particles, recovering fine crack features that are otherwise obscured in LR data. SRGAN outperforms U-NetGAN and SRResNet baselines across MSE, perceptual losses, and MSSIM, and crucially in downstream crack segmentation and size-distribution analyses, where errors have greater impact on scientific interpretation. The superiority over U-NetGAN is attributed to both architectural differences and the optimization objective (perceptual loss vs L1/TV). The results highlight the limitation of training exclusively on synthetically downsampled HR (SRResNet2), which fails to model noise and acquisition artifacts of real LR data, leading to degraded performance, particularly under higher noise. In unpaired scenarios, CinCGAN proves a viable alternative, significantly improving over SRResNet2 by leveraging real LR images and cycle consistency, though it still trails SRGAN trained with paired data. The study also emphasizes that standard pixel-wise metrics (MSE, perceptual losses) are not fully sensitive to errors in minority phases (cracks). Task-oriented metrics (Jaccard index for crack masks, specific crack density error, and distributional distances for crack sizes) better reflect the practical utility for materials analysis. Potential enhancements include multi-discriminator or feature-space adversarial training to further reduce discrepancies between super-resolved and true HR images, especially for small crack sizes.
Conclusion
This work demonstrates that SRGAN-based super-resolution can mitigate the resolution–field-of-view trade-off for SEM imaging of Li-ion cathode materials by accurately enhancing LR images (α = 2.5) and enabling representative quantification of fine crack features. SRGAN achieves superior reconstruction quality and substantially improves crack segmentation accuracy, specific crack density estimation, and crack size distribution fidelity compared to bilinear upsampling and alternative deep networks (U-NetGAN, SRResNet variants, CinCGAN). For scenarios lacking paired LR–HR data, CinCGAN offers a practical solution leveraging unpaired datasets. The approach is generalizable to other microscopy modalities and fine-feature analyses in materials science. Future directions include: expanding datasets to diverse materials and noise regimes; integrating additional discriminators or feature-space losses; exploring self-supervised or domain-adaptive training; and applying the method to large-scale studies of aging parameters and crack formation in NMC particles.
Limitations
- Limited number of paired LR–HR samples (33 pairs) with a small test set (4 pairs), which may constrain generalizability.
- HR images were denoised prior to training; performance may depend on denoising strategy and parameters.
- SR factor fixed at 2.5; different scaling factors may require retraining or architectural adjustments.
- Methods trained on synthetic LR (SRResNet2) underperform on real LR due to noise/domain mismatch, highlighting sensitivity to training data realism.
- Crack segmentation evaluation depends on a specific segmentation pipeline and excludes very small features (<50 nm), potentially biasing metrics.
- Unpaired methods (CinCGAN) improve over synthetic-only training but still lag paired SRGAN; availability of paired data remains advantageous.
Related Publications
Explore these studies to deepen your understanding of the subject.

