Engineering and Technology

Learning-based real-time imaging through dynamic scattering media

H. Liu, F. Wang, et al.

This groundbreaking research, conducted by Haishan Liu, Fei Wang, Ying Jin, Xianzheng Ma, Siteng Li, Yaoming Bian, and Guohai Situ, introduces a revolutionary learning-based technique for real-time, non-invasive imaging through dense scattering media. Demonstrating exceptional image quality and speed, this study pushes the boundaries of conventional imaging methods.

00:00

~3 min • Beginner • English

Index

Introduction

Classical image formation assumes that the spatial-spectral information carried by light is not heavily distorted during propagation. In scattering environments (e.g., biological tissues, haze, fog, turbid water), this assumption fails, causing severe image degradation: speckle under coherent illumination or loss of contrast under incoherent illumination. The problem of imaging through scattering media remains a central challenge in optical imaging due to its importance for applications ranging from biomedicine to remote sensing. Traditional solutions either isolate early-arriving (ballistic or least-scattered) photons or boost signal-to-noise ratio via gating (e.g., Kerr effect), coherence/polarization selection, spectrum-matched illumination, absorption-based noise suppression, or spatial filtering in 4f systems. However, ballistic light decays exponentially with optical thickness, limiting depth and restoration quality. Computational approaches, particularly deep learning, can leverage both early and some multiply scattered light to improve imaging. Yet prior learning-based studies often use artificial, quasi-static, or homogeneous media (e.g., ground glass, polystyrene slabs, emulsions) and invasive ground-truth acquisition (SLM/DMD bypassing the scatterer), leading to mismatched transmission matrices under dynamic, inhomogeneous real media and datasets that poorly reflect real object properties. This work addresses these limitations with a learning-based method, DescatterNet, designed for incoherent imaging through non-static, inhomogeneous media and evaluated on real-world scenarios.

Literature Review

Prior art includes gating and selection techniques to isolate early-arriving photons or reduce multiple-scatter noise: Kerr-effect temporal gating, coherence and polarization selection, spectrum tailoring of illumination, absorption-based suppression, and spatial filtering in 4f systems. While effective at shallow optical depths, these approaches are constrained by the exponential decay of ballistic light with increasing optical thickness. Computational imaging methods have incorporated late-arriving scattered light, with deep learning showing promise for difficult inverse problems. Physics-enhanced neural networks (PhysenNet) avoid training data by embedding forward models, but accurate forward models for general thick, dynamic scattering media are often intractable, with thin-layer cases being notable exceptions. Previous learning-based demonstrations frequently relied on artificial scattering media and invasive ground-truth acquisition using SLM/DMD illuminated coherently without the scatterer, leading to datasets that do not match real object reflectance and dynamic, inhomogeneous conditions. These gaps motivate a method that learns from realistic, non-invasive data and generalizes across varying scattering conditions and scenes.

Methodology

The proposed approach, DescatterNet, targets incoherent imaging through non-static and inhomogeneous scattering media. The pipeline comprises three components: (1) Dataset acquisition: a custom indoor experimental setup captures thousands of paired scattered-clear images under various scattering conditions. Ground-truth images are presented on an e-ink display to better match real object reflectance characteristics, and raw scattered patterns are recorded through dynamic scattering media (e.g., fat emulsion in water) at controlled concentrations. (2) Domain-gap bridging and preprocessing: to enhance generalization to real-world objects and outdoor scenes not observed during training, a preprocessing stage is applied. The illustrated steps include background handling and contrast-limited adaptive histogram equalization (CLAHE) to normalize inputs across different scattering conditions, producing inputs Ir(x,y) with reduced domain shift. (3) Network design and optimization: the DescatterNet architecture is optimized and compared against alternatives (HNN, MulScaleCNN, UNet, AttentionUNet, SwinIR) in terms of parameter count, FLOPs, inference speed, and image quality (Corr, PSNR). Training uses paired scattered/clear data at multiple scattering strengths (e.g., fat emulsion volumes V = 1.8, 2.4, 2.8, 3.2, 3.6 ml corresponding to increasing optical thickness), and separate models can be trained per condition or with mixed data to evaluate generalization. Performance is assessed on unseen real objects and test images, with additional demonstrations on turbid water and natural fog scenes. Inference is real-time capable on an RTX3090 GPU.

Key Findings

- Qualitative and quantitative improvements over traditional enhancement methods: In highly scattering conditions (optical thickness ~5.51 for V=2.4 ml), raw images show extremely low contrast. Traditional methods yield low quality reconstructions: Dark Channel Prior (e.g., Corr/PSNR ≈ 0.256–0.507 / 9.53–10.4 dB) and Retinex (e.g., 0.312–0.366 / 7.81–8.0 dB). Learning methods perform substantially better: MulScaleCNN reaches Corr/PSNR ≈ 0.926–0.959 / 17.1–18.58 dB, while DescatterNet further improves or matches quality with Corr/PSNR ≈ 0.935 / 18.70 dB and ≈ 0.953 / 18.2 dB, recovering more high-resolution structural details. - Network comparison (Table 1) on a common test set: • DescatterNet: 1.94 M parameters, 10.59 G FLOPs, 338.62 FPS (RTX3090), Corr 0.8488, PSNR 18.00 dB. • HNN: 1433.12 M params, 16.86 G FLOPs, 38.60 FPS, Corr 0.6946, PSNR 15.25 dB. • MulScaleCNN: 1.41 M params, 6.38 G FLOPs, 80.43 FPS, Corr 0.8492, PSNR 17.93 dB. • UNet: 31.04 M params, 167.51 G FLOPs, 65.02 FPS, Corr 0.8432, PSNR 17.80 dB. • AttentionUNet: 34.88 M params, 203.83 G FLOPs, 49.95 FPS, Corr 0.8468, PSNR 17.83 dB. • SwinIR: 0.14 M params, 33.97 G FLOPs, 1.15 FPS, Corr 0.8218, PSNR 17.11 dB. DescatterNet delivers the highest inference speed with competitive or superior image quality and small model size. - Upper limit with increasing scattering strength: Using fat emulsion concentrations V = 1.8, 2.4, 2.8, 3.2, 3.6 ml corresponding to optical thickness ≈ 4.2, 5.51, 6.35, 7.19, 7.92, reconstructions degrade with increasing V. Images become completely corrupted between V ≈ 2.8–3.2 ml, indicating a practical upper limit under the tested setup. - Cross-concentration generalization: A model trained at V=2.4 ml generalizes across concentrations, with improved performance when trained on mixed concentrations. Example single-image results show improvements after DescatterNet from raw Corr/PSNR 0.6140/12.31 dB to 0.9229/19.34 dB; from 0.7219/11.29 dB to 0.9294/19.03 dB; and from 0.6541/11.29 dB to 0.8088/14.69 dB. - Real-world applicability: Experiments demonstrate the ability to see through turbid water and natural fog, supporting non-invasive, real-time, incoherent imaging of real-world objects.

Discussion

The study addresses the long-standing problem of imaging through dynamic, inhomogeneous scattering media by leveraging learning-based reconstruction that utilizes both early-arriving and portions of multiply scattered light. DescatterNet overcomes key practical barriers faced by prior works: it is trained on more realistic scattered/clear pairs, employs preprocessing to reduce domain gaps across conditions, and uses an efficient architecture to enable real-time deployment. The results show marked improvements in structural fidelity and contrast over traditional enhancement methods and competitive advantages in speed and quality over several deep networks. The observed upper limit at higher optical thickness reflects current acquisition and model constraints rather than a fundamental physical bound. Improvements in sensor dynamic range, noise, quantum efficiency, illumination power, and architecture design, as well as better separation of early vs. late scattered components, are expected to push this boundary. The demonstrated cross-concentration generalization and reconstruction of unseen real objects indicate robustness, suggesting applicability to diverse real-world scenarios including turbid water and fog.

Conclusion

The paper introduces DescatterNet, a learning-based, real-time method for incoherent imaging through dynamic and inhomogeneous scattering media. By constructing realistic datasets, applying domain-gap-reducing preprocessing, and optimizing a compact, fast architecture, the method reconstructs high-quality images from severely degraded inputs. It outperforms classical enhancement techniques and matches or exceeds competing deep networks while achieving high throughput. Experiments validate applicability to real objects and outdoor-like conditions (turbid water, natural fog). Future directions include improving the acquisition system (higher dynamic range, lower noise sensors, stronger illumination), refining network design, and devising methods to better isolate or exploit early-arriving photons to extend performance at higher optical thickness.

Limitations

- Practical upper limit at higher scattering strengths: reconstructions failed between V ≈ 2.8–3.2 ml under the tested fat emulsion setup, reflecting current acquisition and processing constraints. - Dependence on early-arriving photons: performance deteriorates as multiple scattering dominates; without improved separation or modeling of photon paths, recovery is limited. - Training data specificity: although preprocessing and mixed datasets improve generalization, models may still be sensitive to domain shifts across media types and dynamic conditions not represented in training. - Incomplete affiliation details and some methodological specifics (e.g., exact acquisition geometry, parameter choices) are not fully described in the provided excerpt, potentially affecting reproducibility from this text alone.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Deep learning-enabled real-time personal handwriting electronic skin with dynamic thermoregulating ability

S. Xiang, J. Tang, et al.

Biology

COSMOS: a platform for real-time morphology-based, label-free cell sorting using deep learning

M. Salek, N. Li, et al.

Psychology

Robust language-based mental health assessments in time and space through social media

S. Mangalik, J. C. Eichstaedt, et al.

Physics

Deep learning at the edge enables real-time streaming ptychographic imaging

A. V. Babu, T. Zhou, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny