logo
ResearchBunny Logo
Leveraging generative adversarial networks to create realistic scanning transmission electron microscopy images

Physics

Leveraging generative adversarial networks to create realistic scanning transmission electron microscopy images

A. Khan, C. Lee, et al.

This groundbreaking research, conducted by Abid Khan, Chia-Hao Lee, Pinshane Y. Huang, and Bryan K. Clark, introduces an innovative machine learning framework that enables models to efficiently generalize across extensive datasets in electron microscopy. Utilizing a cycle generative adversarial network, their approach not only enhances simulated data with realistic details but also streamlines the identification of single atom defects with astonishing adaptability.

00:00
00:00
~3 min • Beginner • English
Introduction
Machine learning has been widely applied to electron microscopy, but supervised models require large labeled datasets. Simulations can provide labels but diverge from experiments due to detector noise, drift, scan distortions, alignment errors, radiation damage, contamination, and other factors that vary across sessions. As a result, ML models trained on narrow simulation parameter ranges lack generalizability and often require frequent retraining as imaging conditions change. The study addresses the research problem of bridging the sim-to-experiment gap to produce realistic training images that preserve ground-truth labels, thereby enabling robust, adaptable defect identification across varying experimental conditions. The authors propose a CycleGAN-based image-to-image translation framework, enhanced with Fourier-space (reciprocal space) discriminators, to transfer experimental noise/textures onto simulated STEM images while preserving structural content and labels, and then train FCNs for single-atom defect identification.
Literature Review
Prior work has applied ML across many electron microscopy tasks: atom localization, defect identification, denoising, tilt/thickness estimation, structure classification, convergence angle optimization, Bragg disk identification, deformation visualization, and automated alignment. Reviews outline the expanding interface of ML and microscopy. Simulation-to-experiment gaps are tied to statistical noise in STEM, scan instabilities and distortions, instrument instabilities, radiation damage, and contamination. Earlier approaches have used simulated Z-contrast for defect identification in 2D materials, but performance is sensitive to contamination and microscope conditions. GAN-based restoration and denoising methods have been explored in EM, but preserving ground-truth labels during style transfer and achieving quantitative similarity to experimental data remain challenging.
Methodology
Data and acquisition: Experimental annular dark-field STEM datasets for monolayer graphene, monolayer WSe2, and bulk SrTiO3 acquired on a Thermo Fisher Themis-Z (80 kV; typical settings for WSe2/SrTiO3: 25 mrad convergence semi-angle, 35 pA probe current, 63–200 mrad collection semi-angles, 14–20 pm pixel size, 20 µs dwell time per pixel, 10-frame averages; graphene: 100 pA, 25 mrad inner collection angle, post-anneal at 1000 °C). Automated scripts were used to acquire large WSe2 datasets on two days (Day A: 107 images; Day B: 211 images). The combined set (Day AB) has 318 images. Simulation: Out-of-the-box incoSTEM (Computem) simulations generated semi-quantitative ADF-STEM images and defect labels. For baseline no-noise simulations, counting noise, source size, and aberrations were omitted. A manually optimized noise baseline included contamination extracted from experiments, Poisson/Gaussian noise, image shear, source size, up to 2nd-order aberrations, brightness/contrast, and slight random atomic position perturbations (std 0.01 Å) to avoid periodic sampling artifacts. CycleGAN architecture: Two generators (G: X→Y, F: Y→X; U-Net style with instance normalization) and four PatchGAN discriminators: real-space image discriminators DX,img and DY,img and FFT discriminators DX,FFT and DY,FFT. FFT discriminators operate on log power spectra (log|FT(I)|^2) to enforce realistic reciprocal-space characteristics (low-frequency contamination, high-frequency noise). Patch discriminators output a 30×30 grid (each 70×70 patch realness). Identity and cycle-consistency constraints ensure mappings preserve content and reversibility. Losses and optimization: Least-squares GAN losses for discriminators and adversarial terms for generators; total generator loss L(G)=Ladv(G)+λcyc Lcyc(F,G)+λid Lid(G) and similarly for F. Cycle-consistency loss uses L1; identity loss uses L2. Hyperparameters: λcyc=10, λid=5. FFT discriminator losses are included in generator adversarial loss. Training details: images normalized to [−1,1] after ±3.5σ saturation; cut into 256×256 patches; batch size 42 with random 90° rotations and flips. To provide variability to out-of-the-box simulations, Gaussian noise with std=0.1 was added (no manual tuning needed). 297 epochs; learning rate 0.032 for first 148 epochs, linearly decayed to 0 by epoch 297. Implemented in TensorFlow, trained on an NVIDIA A40 GPU (≈6 hours from scratch). Separate CycleGANs were trained per material and per daily condition set (A, B, AB) due to style/condition specificity. Evaluation metrics: Quantitative similarity measured via Fréchet Inception Distance (FID) computed using Inception v3 features and KL divergence between normalized pixel intensity histograms. Data for metrics consisted of roughly 1700 256×256 patches per dataset. FCN architecture and training: A fully convolutional network (same architecture as prior work) trained with Adam optimizer and categorical cross entropy. From each 1024×1024 image, 16 patches (256×256, stride 256) were extracted; each patch augmented by rotations, flips, and scale jittering to yield 24 augmentations (384 per image). For FCN training, 107 1K-resolution images were used; patches split 10:1 into training/simulated test. Each FCN epoch sampled 1000 patches; 500 epochs total. FCN test sets: Experimental test sets comprised 3 images from Day A and 3 from Day B (labeled A and B), covering 4–8-hour spans. Single Se vacancy labels were generated by intensity-based criteria with minor Fourier filtering assistance; average defects per image: A ≈117, B ≈177; field of view 21×21 nm^2, 1024×1024 pixels. A combined AB test set contains 6 images. Workflow: Simulate labeled images, train CycleGAN on experimental and simulated images, transform simulations to realistic images while preserving labels, and train FCN on these for defect identification on experimental images.
Key Findings
- CycleGAN realism and quantitative similarity: - For WSe2, FID with respect to experiment: simulation (no noise) 32; simulation with manually optimized noise 0.73; CycleGAN 0.35 (lowest non-zero, best match). Experimental baseline FID is 0.00 by definition. - Across materials (Table 1), FID vs experiment: graphene simulation 32.69 vs CycleGAN 1.66; WSe2 simulation 31.87 vs CycleGAN 0.35; SrTiO3 simulation 19.60 vs CycleGAN 0.47. - KL divergence of pixel intensity histograms (WSe2): experiment 0.00; simulation 0.33; manual noise 0.18; CycleGAN 0.01 (closest to experiment). - Power spectra analysis indicates CycleGAN-processed images have the closest power distribution to experimental images. - FFT discriminators are critical; without them, artifacts (e.g., streaking in graphene) emerge. - Label preservation: CycleGAN transfers local texture/noise while preserving defect types and positions from simulated inputs (Fig. 6), enabling direct use of simulation-derived labels for supervised training. - FCN defect identification performance (precision P, recall R, F1 in %) on experimental data (Table 2): - Training on no-noise simulations yields poor performance (e.g., AB: P=22, R=83, F1=35). - Training on manually optimized noise yields strong results (AB: P=74, R=92, F1=82). - Training on CycleGAN-processed images achieves comparable performance with much less manual effort. Best results occur when CycleGAN and test day match, indicating adaptation to daily conditions (e.g., CycleGAN-AB on AB: P=89, R=63, F1=74; CycleGAN-A on A: P=87, R=91, F1=89; CycleGAN-B on B: P=85, R=79, F1=82). Highest precision observed: 98% (CycleGAN-AB tested on A) and 89% precision on AB. - Data efficiency and adaptability: Comparable FCN performance can often be achieved with as few as 6 experimental images to fine-tune CycleGAN (though occasionally unstable). Training a CycleGAN from scratch takes ~6 hours; updating a pre-trained model with small new datasets may be significantly faster. - Scope: CycleGANs trained per material system generate images nearly indistinguishable from real data in both real and reciprocal space, enabling robust, scalable training data generation.
Discussion
The work directly addresses the simulation-to-experiment domain gap that undermines the generalizability of ML models in STEM. By incorporating both image- and Fourier-space discriminators and enforcing cycle-consistency and identity constraints, the CycleGAN transfers experimental noise and texture to simulated images while preserving structural content and defect labels. Quantitatively, the strong reductions in FID and KL divergence demonstrate that the generated images closely match experimental distributions. This realism translates to practical gains: FCNs trained on CycleGAN outputs achieve high precision and recall on real experimental data without laborious manual simulation parameter tuning. Performance peaks when the CycleGAN is trained on images from the same day as the test set, suggesting it captures session-specific instrument and contamination conditions. The approach therefore provides a path to dynamic, near real-time adaptability of ML pipelines to evolving microscope conditions, facilitating autonomous, large-scale defect identification and potentially other tasks in electron microscopy.
Conclusion
The study introduces a CycleGAN-based framework with reciprocal-space discriminators to transform simulated STEM images into realistic, experiment-like images while preserving labels. This enables training FCNs that accurately identify single-atom defects in large, automatically acquired datasets under varying experimental conditions, with performance comparable to manually optimized training regimes and far less human intervention. The method scales readily, requires relatively small experimental datasets to adapt to new conditions, and can be extended to other materials and imaging modalities with appropriate forward models. Future directions include integrating with advanced model ensembles for improved generalization, incremental or continual learning to adapt CycleGANs in near real time as conditions evolve, and applying the approach across broader microscopy tasks (e.g., denoising, segmentation, quantitative measurements) to advance autonomous microscopy.
Limitations
Generalization across distinct conditions is limited: CycleGANs trained on one day or condition set often perform worse on images from a different day, indicating sensitivity to session-specific factors. Separate CycleGANs are needed per material system and condition regime due to large shape/style differences. Training from scratch takes roughly 6 hours on a single GPU; small-data fine-tuning can still occasionally yield unstable FCN performance. FFT discriminators use only amplitude information, not phase; while expected to be sufficient here, certain applications might benefit from phase-aware designs. The approach depends on the availability of a reasonable forward simulation model and may require curation when microscope conditions shift substantially.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny