Medicine and Health
Single-ended recovery of optical fiber transmission matrices using neural networks
Y. Zheng, T. Wright, et al.
Discover groundbreaking advancements in ultra-thin multimode optical fiber imaging, revolutionizing medical endoscopes with high-resolution imaging capabilities. Authors Yijie Zheng, Terry Wright, Zhong Wen, Qing Yang, and George S. D. Gordon present a neural network-based method to address optical distortion issues, enabling rapid and robust image reconstruction.
~3 min • Beginner • English
Introduction
The work addresses the challenge of reconstructing accurate transmission matrices (TMs) for ultra-thin multimode fiber (MMF) endoscopes without distal access, a key barrier to practical in vivo imaging because fiber TMs change with bending and temperature. Existing single-ended approaches using multi-wavelength reflection measurements and special reflector stacks can, in principle, recover the forward TM, but current iterative solvers are computationally expensive and scale poorly with TM size. The authors propose using neural networks to directly recover complex-valued TMs from three reflection matrices measured at different wavelengths, enabling rapid, single-ended calibration that is robust to global phase ambiguity. The goal is to enable fast, accurate TM recovery suitable for real-time imaging, including for non-square TMs and in the presence of modest fiber perturbations.
Literature Review
Prior single-ended TM recovery used wavelength-dependent reflector stacks and nonlinear optimization, but incurred large computation times and required conditions such as distinct reflector eigenvalues. Alternative acceleration strategies include compressed sampling of TMs for fibers with many modes, look-up tables with reflective beacons, and extended Kalman filtering for faster retrieval. Deep learning has been applied to imaging through MMFs (transmission and reflection) for rapid inference but typically degrades under fiber perturbations and often ignores phase or uses amplitude-only losses, leading to ambiguity without reflection calibration. Interferometric referencing can preserve relative phase but global phase remains arbitrary and can drift, causing conventional loss functions (e.g., MSE/MAE) to over-penalize solutions differing only by a global phase. Hence, a phase-insensitive training objective is needed to robustly learn complex TMs from reflection-mode data.
Methodology
Overview: The method recovers a forward TM Aλ from three reflection matrices Cλ at distinct wavelengths using neural networks. The measurement model is Cλ = Aλ Rλ Aλ^T, where Rλ are reflector matrices at the distal facet with wavelength-dependent behavior. The network predicts the complex TM (up to a global phase) from concatenated real-valued encodings of the three measured reflection matrices.
Data generation (simulation): 64×64 complex non-unitary TMs were synthesized to emulate realistic fiber behavior (sparsity in an appropriate basis, dominant diagonal with sub-diagonals, condition numbers ~3–5). TMs were constructed via SVD of a random tri-diagonal matrix with reassigned singular values (0.5–2.5). Wavelengths used for characterization: λ1=850 nm, λ2=852 nm, λ3=854 nm. Three complex random reflector matrices Rλ were generated (distinct eigenvalues with high probability). Reflection matrices Cλ were computed as Cλ = Aλ Rλ Aλ^T. Inputs/outputs were converted to real-valued representations (complex matrices mapped to doubled real dimensions), normalized to [−1,1]. Datasets were split into training/validation/test (e.g., up to 900,000 train, 200,000 validation, 100,000 test; specific architecture runs also used 500,000 training examples for FCNN and 400,000 for U-Net).
Neural network architectures:
- FCNN: Ten-layer dense network (8 hidden layers); first and last hidden layers with 32,768 neurons, others with 8192; LeakyReLU activation; batch normalization between dense layers; dropout 0.2 after first two dense layers; two skip connections. Input: flattened concatenation of three 128×128 real-valued reflection matrices (size 49,152×1). Output: flattened 128×128 real-valued TM (16,384×1). Trained with Adam (lr 0.004, decay 1e-4), 2500 epochs; representative training time ~182.5 h on NVIDIA Tesla V100 (TensorFlow 2.0).
- Convolutional U-Net: Encoder–decoder with seven Conv2D and seven DeConv2D layers, two MaxPooling and two UpSampling layers, LeakyReLU activations, batch normalization between layers, dropout 0.2 after the second and penultimate Conv layers, three skip connections. Input: 128×128×3 (three reflection matrices as channels). Output: 128×128×1 TM. Trained with Adam (lr 0.004, decay 1e-4), 2200 epochs; representative training time ~143 h.
Custom global phase-insensitive loss: To handle degeneracy in global phase, the loss aligns the predicted and target complex matrices by an estimated global phase factor before computing error, with L2 regularization (λ=1e-4). This avoids penalizing solutions differing only by a global phase and improves convergence over MAE/MSE.
Imaging simulations:
- Widefield (λ4≈854.5 nm): Pixel basis at distal end; reconstruct images via (A†)^{-1}(A†X + noise) with Gaussian noise power ≈2% of target.
- Confocal scanning: LP mode basis from a simulated MMF (core radius 30 μm, length 1.5 m, NA 0.24); 64 modes used to form spots; scan 128×128 positions; compute fraction of power in the focal region; reconstruct images by integrating reflected power per spot.
Perturbation robustness test: Reflection matrices at λ1 and λ2 from a fixed TM; λ3 partially from a different TM by swapping final columns at rates 2/64 to 32/64; recover TM and evaluate widefield reconstructions.
Non-square TMs: Demonstrated recovery for rectangular TMs (e.g., 6×12) with appropriate C and R dimensions, exploring over-/under-constrained regimes.
Computational scaling: Empirical scaling of minimum training data, memory usage, and convergence time vs. image dimension M showed roughly quadratic growth for both architectures. Prediction time compared to iterative methods was orders of magnitude lower.
Experimental validation: Cross-validated on measured TMs (488 nm) across 164 bending conformations. To improve domain transfer and prevent overfitting to simulated priors, retraining steps included augmentation with random matrices and fine-tuning on a small subset of experimental TMs. Final model achieved similar errors on simulated and experimental data.
Key Findings
- TM recovery accuracy (simulation): Both FCNN and U-Net recover 64×64 complex TMs with average loss ≤4% on test data using the phase-insensitive loss; MAE loss fails to converge. FCNN achieves lower average loss than U-Net by ~0.7% but uses far more parameters; U-Net converges ~20% faster and uses ~1000× fewer parameters.
- Speed: Once trained, inference takes ~1 s vs. ~1920 s (∼4500× faster) for a 12×12 TM with iterative optimization; iterative method can achieve ≤0.5% loss but at high computational cost.
- Widefield imaging (λ4≈854.5 nm): Three 8×8 complex images (amplitude-only, phase-only, random complex) reconstructed with IMMAE ≤9% and SSIM ≥83% using recovered TMs.
- Perturbation robustness: With simulated column-swap perturbations at λ3, the method tolerates up to ~6% perturbation with TM average loss ≤8% (std ≤0.82%), IMMAE ≤19%, SSIM ≥76%; performance degrades beyond this.
- Confocal imaging: Using recovered TMs and 64 modes, average power in focus was 48.5% vs. 49.9% for the target/ideal case. Confocal images achieved IMMAE ≤5% and SSIM ≥90% (3 percentage points lower error and 4 points higher SSIM than widefield).
- Non-square TM: Demonstrated recovery of a 6×12 TM with average loss 3.96% (std 0.38%); highlights potential for over-/under-constrained cases depending on dimensions.
- Reflector conditioning: Recovery remained effective for reflector matrices with 1, 2, or 6 distinct eigenvalues, achieving average losses 5.02% (std 0.43%), 4.96% (0.41%), and 3.88% (0.37%) respectively, indicating compatibility with various reflector conditions.
- Experimental validation: After domain adaptation, average loss was 3.38% (std 0.48%) for simulated and 3.42% (std 0.57%) for experimental TMs, indicating applicability to realistic fiber conformations.
Discussion
The study demonstrates that neural networks can rapidly and reliably recover forward TMs from single-ended, multi-wavelength reflection measurements, addressing the central challenge of in vivo calibration for ultra-thin MMF endoscopes without distal access. By resolving the global phase degeneracy through a custom loss, the models converge where conventional losses fail, enabling accurate complex TM recovery suitable for image reconstruction tasks. The results show strong performance across both widefield and confocal modalities and robustness to modest fiber perturbations, supporting practical deployment scenarios. Compared to iterative solvers, the method offers orders-of-magnitude speedup, making real-time or near-real-time recalibration feasible. The capability to handle non-square TMs and various reflector conditions broadens applicability to systems with different proximal/distal bases and realistic reflector variability. Cross-validation with experimental TMs confirms domain transferability when combined with data augmentation and limited fine-tuning, underscoring the method’s practicality.
Conclusion
The authors present a neural network approach (FCNN and U-Net) with a global phase-insensitive loss to recover 64×64 complex TMs from three reflection matrices at distinct wavelengths, achieving ≤4% average loss in simulation and ~3.42% on experimental TMs after domain adaptation. The recovered TMs enable accurate imaging: widefield IMMAE ≤9% with SSIM ≥83% and confocal IMMAE ≤5% with SSIM ≥90% and 48.5% focused power. The method is ~4500× faster at inference than iterative optimization, robust to modest TM perturbations (~6%), compatible with non-square TMs, and tolerant of varied reflector eigenvalue structures. Future directions include: scaling to larger TMs with memory-efficient architectures (e.g., U-Nets, autoencoders/latent compression), improved data efficiency (domain transfer, generative/adversarial or adaptive loss strategies), broader wavelength dispersion modeling, and standardized or well-characterized reflector fabrication to ease simulation–experiment alignment.
Limitations
- Data and memory demands: Training for large TMs (e.g., 1024×1024) requires very large datasets (>10 million examples) and substantial memory (≥1 TB), especially for FCNNs; U-Nets reduce but do not eliminate this burden.
- Accuracy vs. speed trade-off: Iterative methods can achieve lower loss (~0.5%) at the cost of much longer runtimes; the NN approach typically yields ≤4% average loss.
- Perturbation tolerance: Robust up to ~6% simulated column swaps; performance degrades for larger perturbations, suggesting sensitivity to severe mid-characterization TM changes.
- Domain shift: Models trained purely on simulations may overfit to simulated priors; domain adaptation (random matrices, fine-tuning on experimental data) is needed for best experimental performance.
- Slight performance differences across architectures: U-Net shows slightly higher error than FCNN, though with better parameter and memory efficiency.
Related Publications
Explore these studies to deepen your understanding of the subject.

