
Physics
Three-dimensional coherent X-ray diffraction imaging via deep convolutional neural networks
L. Wu, S. Yoo, et al.
Discover a groundbreaking 3D machine learning model that enhances phase retrieval accuracy in coherent X-ray diffraction imaging, developed by leading experts Longlong Wu, Shinjae Yoo, Ana F. Suzana, Tadesse A. Assefa, Jiecheng Diao, Ross J. Harder, Wonsuk Cha, and Ian K. Robinson. This innovative approach surpasses traditional methods, offering rapid and precise results for real-time experiments.
~3 min • Beginner • English
Introduction
Coherent X-ray diffraction imaging (CDI) enables characterization of internal 3D structures of single particles, with Bragg CDI allowing 3D strain imaging of crystals. As next-generation X-ray sources improve coherent flux, time-resolved and in-situ CDI will probe nanoscale dynamics. Because diffraction measurements lose phase, numerical phase retrieval is required to reconstruct real-space complex structures. Traditional iterative projection-based methods (e.g., HIO, DM, RAAR) guarantee uniqueness under ideal oversampling for finite objects, but with noisy experimental data they can become trapped in local minima, leading to ambiguous solutions and requiring thousands of iterations, algorithm switching, parameter tuning, and expert intervention. Deep learning has recently emerged for phase retrieval, offering rapid reconstructions, but most approaches are supervised and require large labeled datasets, which are scarce in experimental CDI and may generalize poorly when trained on limited data. This work addresses the need for accurate, fast, and robust 3D phase retrieval by introducing a 3D CNN that can operate in supervised mode for real-time inference and in unsupervised mode (with or without pretraining) to refine or directly retrieve phases from measured diffraction intensities.
Literature Review
The paper reviews iterative projection algorithms for CDI phase retrieval, including Hybrid Input-Output (HIO), Difference Map (DM), and Relaxed Averaged Alternating Reflections (RAAR). These methods can, in theory, recover unique solutions for finite objects under sufficient Fourier modulus oversampling but often struggle with noisy experimental data due to local minima and sensitivity to initialization, necessitating extensive iterations, algorithm switching, and parameter tuning. Maximum-likelihood refinements and convex optimization perspectives have been explored to improve robustness. On the ML front, rapid advances have been made for 2D CDI inversion with CNNs and for 3D with adaptive approaches using spherical harmonics. However, most ML solutions are supervised, requiring large paired datasets of diffraction patterns and ground-truth real-space objects, which are difficult to obtain experimentally; limited training data can reduce generalization, often necessitating subsequent refinement. The present work extends prior ML approaches by proposing a comprehensive 3D encoder–decoder CNN that supports both supervised learning for fast inference and unsupervised optimization directly against measured intensities, either initialized from a pretrained network (transfer learning) or from random weights, thus mitigating data scarcity and initialization sensitivity.
Methodology
Model architecture: A 3D encoder–decoder CNN processes the amplitude of a 3D coherent X-ray diffraction pattern (reciprocal space) and outputs two channels: real-space amplitude and phase of the complex particle density. The network comprises only 3D convolutional blocks, max pooling in the encoder, and upsampling in the decoders. Convolutional blocks use sequences of 3×3×3 kernels with Leaky ReLU activations and batch normalization, interleaved with factorized convolutions (3×1×1, 1×3×1, 1×1×3) plus LReLU and BN. The final layer uses ReLU. Output arrays (amplitude and phase) have half the linear size of the input diffraction array to keep the inversion overdetermined.
Data simulation for supervised training: Complex particles p(r)=s(r)exp(iφ(r)) were generated with amplitude s(r) modeled as a superellipsoid with parameters (a,b,c,n) sampled to span diverse shapes, and phase φ(r) modeled by a 3D Gaussian-correlated random field with correlation lengths (Lx, Ly, Lz). Particles were randomly rotated. Diffraction intensities were computed via Fourier Transform and only amplitudes retained for input; real-space amplitude and (scaled, shifted) phase served as labels. A dataset of 30,000 3D patterns (input size 64×64×64) was created; 95% for training, 5% for validation.
Supervised training: The loss combined real- and reciprocal-space constraints: Ls = (1/(α1+α2+α3)) [α1 L1(Ap,Ag) + α2 L2(Φp,Φg) + α3 L3(Ip,Ig)], where L1 and L2 are relative RMSE terms for amplitude and phase in real space, and L3 is 1−Pearson correlation between predicted and ground-truth diffraction intensities to handle large dynamic range. Weights α1=α2=α3=1. Training used PyTorch with two optimizers alternating every 25 epochs over 100 epochs: ADAM and SGD (initial LR 0.01, decayed by 0.95 every 25 epochs). Early stopping was applied to avoid overfitting. Hardware: 256 GB RAM, two NVIDIA Quadro V100 GPUs.
Unsupervised learning (refinement and ab initio): Given only measured 3D diffraction intensity Im(Q), the network predicts a complex object po(r); its Fourier amplitude defines Ip(Q)=|FT{po(r)}|^2 (after zero-padding of po to match input size). The unsupervised loss Lu = (1/(β1+β2)) [β1 L3(Ip,Im) + β2 L4(Ip,Im)], combining Pearson correlation (L3) and χ² error (L4). β1 followed a modified Weibull schedule to transition from emphasizing correlation to balancing both terms: β1 = a0 − a1/(k/λ + 1), with k=1, λ=0.5 and epoch-dependent scaling so β1 decreases from 10 to 1 across training; β2=1. Optimization alternated ADAM and SGD every 200 epochs (initial LR 0.006, decayed by 0.95 every 200 epochs). Two initializations were used: transfer learning from the supervised-trained model, and random initialization (untrained). For Bragg CDI data, shear-distortion corrections were handled by converting predicted results from detector to laboratory coordinates after zero-padding.
Experimental data and iterative baseline: Bragg CDI measurements of individual SrTiO3 and BaTiO3 (101) and Au and Pd (111) nanocrystals were collected at APS 34-ID-C (9 keV, coherent beam focused to ~630×470 nm^2). 3D intensity volumes were obtained by rocking scans and recorded on a Medipix detector (55 µm pixels), then represented in laboratory coordinates. An iterative phase-retrieval baseline followed Robinson & Harder’s scheme: start from inverse FT of measured amplitudes with random phases in [−π,π], support initially half the input array in each dimension; 50 ER iterations, then alternate 50 HIO (β=0.9) and 50 ER; apply shrink-wrap every 10 iterations after 100 iterations; total 2000 iterations; final results converted to laboratory coordinates.
Performance metrics and analysis: Training/validation loss curves assessed convergence. Reconstruction accuracy was evaluated by χ² and Pearson correlation rp between measured and calculated diffraction intensities. Fourier Spectral Weight (FSW) analyses integrated reconstructed diffraction amplitude over shells of constant Q to compare spatial-frequency content across methods. Inference time and optimization time per epoch were recorded.
Key Findings
- Supervised 3D CNN training achieved a validation loss of ~0.031 after 100 epochs with early stopping, indicating high agreement between predicted and ground-truth amplitude/phase and their diffraction intensities. Inference is fast (~9 ms per 3D pattern on the reported hardware), enabling real-time CDI applications.
- On simulated test data with parameters outside the training distribution, the pretrained CNN alone produced a moderate error (~0.13). Unsupervised transfer learning refinement reduced the error dramatically to ~2×10^-6 on a noise-free pattern. The optimization required ~28.67 ms per epoch, totaling ~3.19 hours for 4×10^4 epochs.
- Unsupervised reconstruction from random initialization (no pretraining) converged to essentially the same final solution quality as transfer learning; pretraining primarily accelerated convergence. Fourier Spectral Weight analysis showed no noticeable differences between the two approaches across spatial frequencies.
- On four experimental Bragg CDI datasets (SrTiO3, BaTiO3, Pd, Au nanocrystals), the CNN in unsupervised mode produced reconstructions whose calculated diffraction intensities closely matched measured data, confirming high reconstruction accuracy. Similar results were obtained from random initialization, demonstrating ab initio phase retrieval capability.
- Reproducibility study (100 runs) on an experimental dataset showed both the untrained CNN and conventional iterative method led to multiple solutions with similar χ² error (≈0.0241 ± 0.0005 SD). The Pearson correlation was slightly higher for ML (rp≈0.9922) than iterative (rp≈0.9915). ML reconstructions qualitatively exhibited sharper features and better-defined facets than iterative results.
- Overall, the combination of supervised learning for rapid estimates and unsupervised optimization for refinement provides accuracy comparable to or better than state-of-the-art iterative algorithms, with added robustness and flexibility via customized loss functions.
Discussion
The research addresses the challenge of non-unique, noisy-data phase retrieval in CDI by introducing a 3D CNN that integrates data-driven priors (via supervised learning) with physics-based consistency (via unsupervised loss on measured intensities). Supervised training enables immediate, high-quality reconstructions useful for real-time experiments (e.g., XFEL single-shot imaging), while the unsupervised loss refines or directly retrieves solutions without ground truth, mitigating training-data scarcity. The unsupervised approach’s combination of Pearson correlation and χ² increases sensitivity to weaker diffraction features beyond χ² alone, improving fit quality across dynamic range. Demonstrations on simulated and diverse experimental Bragg CDI datasets show that the method achieves high fidelity, with calculated intensities matching measurements and reconstructions exhibiting sharp, well-defined morphology and phase. The ability to converge from random initialization to comparable solutions confirms that the network architecture, coupled with an appropriate loss, acts as a powerful phase-retrieval optimizer, reducing reliance on large supervised datasets and expert-tuned iterative schemes. Reproducibility tests indicate comparable χ² performance to conventional methods but with slightly better correlation and visually sharper features, suggesting improved reconstruction quality. These results broaden CDI applicability to challenging, asymmetric, and noisy cases and open avenues for integrating flexible, task-specific loss functions.
Conclusion
A comprehensive ML framework for 3D CDI phase retrieval was presented, combining a supervised 3D encoder–decoder CNN for rapid inversion with an unsupervised optimization mode that refines or directly retrieves phases from measured intensities. The approach delivers immediate, accurate reconstructions suitable for real-time experiments and achieves final accuracies comparable to or surpassing conventional iterative methods. Notably, unsupervised learning succeeds even without pretraining, with pretraining mainly improving convergence speed. The flexibility to design loss functions that better capture experimental statistics enhances robustness to noise and weak features. Future work could extend tailored loss functions (e.g., likelihood models), broaden training to more complex structures and noise models, and exploit the hybrid supervised–unsupervised paradigm for other phase retrieval and imaging modalities.
Limitations
- Supervised learning performance depends on the quantity and diversity of training data; limited datasets reduce generalization and may miss subtle features, necessitating subsequent refinement.
- Unsupervised optimization from random initialization converges more slowly than transfer learning; high-accuracy refinement can be computationally intensive (e.g., ~3.19 hours for 4×10^4 epochs in reported hardware).
- Ground-truth experimental datasets for supervised training are scarce, constraining the scope of purely supervised approaches.
- Validation was demonstrated on four experimental Bragg CDI cases; broader generalization to other materials, beam conditions, and different CDI modalities remains to be established.
- For Bragg CDI, coordinate transformations (shear corrections) are required to compare with laboratory coordinates, adding preprocessing/postprocessing complexity.
Related Publications
Explore these studies to deepen your understanding of the subject.