logo
ResearchBunny Logo
Non-line-of-sight imaging with arbitrary illumination and detection pattern

Engineering and Technology

Non-line-of-sight imaging with arbitrary illumination and detection pattern

X. Liu, J. Wang, et al.

This groundbreaking research by Xintong Liu, Jianyu Wang, Leping Xiao, Zuoqiang Shi, Xing Fu, and Lingyun Qiu introduces a novel Bayesian framework for non-line-of-sight imaging, allowing for high-quality reconstructions even with irregular measurement patterns, which vastly broadens real-world applications.

00:00
00:00
~3 min • Beginner • English
Introduction
Non-line-of-sight (NLOS) imaging seeks to recover albedo and surface normal of hidden objects using time-resolved photon measurements acquired via laser pulses and detectors (e.g., SPADs). Typical systems use a relay surface where light diffusely reflects before and after interacting with hidden targets. Many established voxel-based reconstruction methods require dense, regularly gridded measurements across large relay areas. These requirements limit practical deployment when relays are small, irregular, or only sparsely accessible, and they increase acquisition time, which is problematic for dynamic applications (e.g., autonomous driving). The study addresses the research question: can we reconstruct high-quality NLOS scenes (albedo and surface normals) from arbitrary, possibly coarse and irregular illumination/detection patterns, including confocal and non-confocal setups? The proposed solution is a Bayesian framework with a new CC-SOCR algorithm that introduces a virtual confocal signal to complement incomplete measurements, aiming to reduce acquisition requirements and broaden applicability across relay geometries.
Literature Review
NLOS methods can be categorized by hidden surface representation: point-cloud, mesh-based, and voxel-based, with voxel-based approaches offering efficiency and fine reconstructions. Key voxel-based methods include: back-projection (BP) with rendering accelerations and filtering for noise reduction; the light-cone transform (LCT) and directional LCT (D-LCT) casting reconstruction as deconvolution in Fourier space (effective under confocal setups); frequency–wavenumber migration (F-K) leveraging wave equations (also confocal-focused). Transforming non-confocal data to confocal incurs non-negligible error. Phasor field (PF) methods model diffractive propagation and enable fast reconstructions, including low-latency videos with SPAD arrays. SOCR introduces priors on both target and signal for high-quality, low-noise reconstructions but assumes regular scanning. For non-planar relays, F-K and BP-type methods can be applied directly, while planar-only methods require signal shifting. Recent efforts explored sparse measurements: confocal circular scans and compressed sensing with sparse grids; single-shot tracking of moving targets is possible but fails for static scenes due to ill-posedness. These works highlight limitations of regular, dense sampling and motivate methods robust to arbitrary sampling patterns.
Methodology
Physical model: The transient intensity at time t for illumination x_i and detection x_d is r(x_i,x_d)(t)=∫_Ω f(x) n(x) δ(x_i−x + x_d−x − ct) dx. Defining u(x)=f(x) n(x) yields linearity r=Au, with albedo given by ||u(x)|| and surface normal by u(x)/||u(x)|| (undefined where albedo is zero). Discretization forms T=Au over a voxel grid Ω. Measured and estimated signals: Measurements comprise M pairs (p_m,q_m) with photon counts in the first 7 time bins. The measured noisy signal at these pairs is b. To mitigate noise, an estimated signal b (denoised/approximated version of the ideal measurement) is introduced and treated as a random vector in a Bayesian framework. The simulated signal at measured pairs is A_s u. Virtual confocal signal: To address rank deficiency from sparse/irregular sampling, a virtual confocal signal d is introduced at regular focal points on the plane z=0 over an I×J grid. The simulated confocal signal is A_d u. Overlap between measured and virtual sets C_common enforces consistency through operators R_b(b,d) and R_d(b,d). Bayesian formulation: Treat u (target), b (measured), b (estimated), and d (virtual confocal) as random vectors. The MAP estimate maximizes P(u,b,d|b). Assumptions yield factorization with likelihood P(b|u,b)=exp(−||b−b||^2−γ(u,b,b)), joint prior P(u,b)=exp(−||u−u||^2−τ(u,b)), and conditional P(d|u,b)=exp(−||R_b(b,d)−R_d(b,d)||^2 − λ_g||A_d u − d||^2 − ξ(u,d)). This leads to an optimization objective combining data-fit terms ||A_s u − b||^2, ||A_d u − d||^2, consistency on overlaps, and regularizations Γ, ξ, and τ designed for signal–object collaboration. Regularizations (Methods): L(u,b) encodes sparsity and non-local self-similarity for the albedo and an l0 prior on b, using data-driven tight frames and BM3D-style block matching over 3D albedo blocks with learned orthogonal transforms. Y(u,b,b) applies Wiener filtering (temporal) to b with penalties coupling patches of Au and b via ratios of temporal derivatives and DCT-domain coefficients, weighted by noise level. L(u,d) promotes sparse, shared representations of local 3D patches (2D space × 1D time) between d and A_d u with learned dictionaries and l0 sparsity on coefficients and d. The concrete optimization problem jointly minimizes over u, b, d and transform coefficients (Ci, Si, Qi) with weights λ balancing data-fit and priors; constraints enforce orthogonality and dictionary structures. Optimization and complexity: The problem is solved by alternating minimization over subproblems (detailed in Supplementary Note 2), with convergence guarantees for subproblems but not global convergence due to approximate updates of u. Memory complexity is O(max(N^3, M N)); time complexity per iteration is O(max(N^3, M N^3)); for N×N confocal measurements, time O(N^5) and memory O(N^3), matching SOCR. Using coarser virtual grids reduces complexity (e.g., √N×√N virtual grid for O(N) measurements). The method is amenable to GPU acceleration and parallelization. Extensions include virtual non-confocal signals and multiple planes (with increased complexity).
Key Findings
- The proposed CC-SOCR reconstructs both albedo and surface normals from arbitrary illumination/detection patterns (confocal and non-confocal), including irregular and limited relay regions (fences, shutters, window edges, sticks, letter-shaped and heart-shaped relays), with low background noise and fine detail. - Synthetic non-confocal pyramid (36 points; 36 confocal + 1260 non-confocal pairs; 32 ps): CC-SOCR accurately localizes the target without background noise, achieving maximum depth error 0.02 m versus 0.12 m for LOG-BP. Classification error (excessive/missing voxels) is 2.86% for CC-SOCR vs 21.75% for LOG-BP. - Confocal measured statue (Stanford dataset, sub-sampled 64×64; 32 ps): With shutter-like relay (21 rows, 1344 focal points; ~18.46 s acquisition vs ~56.25 s for 64×64), CC-SOCR yields faithful reconstructions, while LOG-BP is noisy and F-K, D-LCT, SOCR are blurry or artifact-prone. - Confocal statue with very coarse sampling (10×10 focal points in 2×2 m²; ~1.37 s): CC-SOCR correctly localizes targets and preserves details; competing methods (LOG-BP, F-K, D-LCT, SOCR with nearest-neighbor interpolation) are noisy, blurry, or artifact-laden. - Confocal statue across varied irregular relays: 200 random points; five vertical bars (1344 points); letters N,L,O,S (825 points); sparse sticks (1229 points); heart-shaped (258 points). CC-SOCR performs robustly; notably, in the heart-shaped case, only CC-SOCR correctly locates the target. - Non-confocal measured data (phasor field dataset, 64×64 illuminations; 16 ps): Under subsets (vertical/horizontal bars, 14×14 uniform, 200 random), CC-SOCR and SOCR reconstruct targets; however, SOCR exhibits artifacts or loses details, influenced by interpolation bias. CC-SOCR provides faithful reconstructions in all cases, outperforming PF-BP, PF-RSD, and LOG-BP (which show noise/artifacts). - Irregular and non-planar relay settings (Stanford letters with retroreflective bias): CC-SOCR directly handles irregular, non-planar illumination regions and correctly locates targets in challenging cases (e.g., oval-shaped non-planar region) where other methods fail after shifting and interpolation. - Acquisition efficiency: Coarse sampling suffices under CC-SOCR (e.g., 10×10 or sparse patterns), substantially reducing acquisition time compared to dense grids while maintaining quality. - The virtual confocal signal is critical: Without it, reconstructions may be blurry or artifact-prone; with it, CC-SOCR robustly converts non-confocal signals to confocal counterparts and regularizes ill-posed cases.
Discussion
The study addresses the core challenge of practical NLOS imaging: dependence on dense, regular sampling over large relay areas. By introducing a virtual confocal signal within a Bayesian signal–object collaborative framework, CC-SOCR overcomes rank deficiency and leverages complementary priors on measured/estimated signals and targets. This enables accurate albedo and surface normal reconstructions from arbitrary, sparse, and irregular relay configurations, including non-planar surfaces, thereby broadening NLOS applicability to realistic relays (e.g., shutters, fences, frames). Compared to SOCR, CC-SOCR removes the need for spatially regular measurements, applies temporal filtering rather than spatial correlation on measured signals, and couples priors to both measurements and virtual confocal data. Empirical results across synthetic and measured datasets demonstrate superior robustness and fidelity relative to LOG-BP, LCT/D-LCT, F-K, PF-BP/PF-RSD, and SOCR, particularly when interpolation would otherwise introduce bias. The framework thus directly advances reconstruction quality and acquisition efficiency, answering the research question by showing high-quality reconstructions under arbitrary illumination/detection patterns.
Conclusion
This work introduces CC-SOCR, a Bayesian non-line-of-sight imaging framework that incorporates a virtual confocal signal and collaborative priors on signals and targets. It reconstructs albedo and surface normals with high fidelity under general, irregular, and sparse relay configurations (confocal and non-confocal), significantly reducing acquisition requirements and expanding applicability beyond large, regular relay surfaces. Extensive experiments validate superior performance over state-of-the-art methods, including in challenging non-planar and highly incomplete measurement scenarios. Future directions include reducing computational complexity via octree scene representations, GPU/parallel implementations, exploring virtual non-confocal signals and multi-plane virtual confocal signals to exploit spatial correlations, and further improving global convergence properties.
Limitations
- Computational burden: Memory complexity O(max(N^3, M N)) and time complexity per iteration O(max(N^3, M N^3)); even with confocal N×N sampling time is O(N^5), comparable to SOCR. Adding multiple virtual planes or non-confocal virtual signals increases complexity. - Convergence: While subproblem convergences are guaranteed, global convergence of the alternating optimization is not guaranteed due to approximate updates of the target. - Model mismatch: Scenes with retroreflective materials (e.g., retroreflective letters) induce bias relative to the physical model used. - Ill-posedness with very limited measurements necessitates the virtual confocal regularization; without it, reconstructions can be blurry or artifact-prone. - Practical performance may depend on parameter selection for priors and dictionaries (guidance provided in supplementary notes).
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny