
Engineering and Technology
Low-latency time-of-flight non-line-of-sight imaging at 5 frames per second
J. H. Nam, E. Brandt, et al.
This groundbreaking research conducted by Ji Hyun Nam, Eric Brandt, Sebastian Bauer, Xiaochun Liu, Marco Renna, Alberto Tosi, Eftychios Sifakis, and Andreas Velten showcases a multipixel time-of-flight NLOS imaging technique that brings to life the 3D geometry of hidden objects. It employs advanced SPAD detectors and a fast reconstruction method, offering low-latency video capture of complex scenes with natural objects, revolutionizing NLOS imaging.
~3 min • Beginner • English
Introduction
The paper addresses the challenge of fast, robust, and scalable non-line-of-sight (NLOS) 3D imaging, where hidden objects are reconstructed via time-of-flight (ToF) measurements on a diffuse relay wall. Prior confocal ToF approaches with single-pixel sensors require dense wall scans, suffer from low light efficiency and contamination by first-bounce returns, and typically need long capture times (seconds to minutes) or rely on retroreflective targets. The authors aim to enable live, low-latency NLOS video of diffuse, natural scenes by maximizing photon efficiency through SPAD arrays and by devising a reconstruction pipeline whose SNR, motion blur, angular and depth resolution are largely independent of scene depth. They propose a system and algorithmic framework based on Phasor Field (PF) virtual wave optics, sparse illumination with virtual aperture remapping, and depth-dependent temporal averaging to achieve real-time reconstructions at 5 fps with low latency.
Literature Review
The paper situates its work within ToF NLOS imaging and alternative approaches. Confocal methods using fast f-k migration (FK) and light-cone transform (LCT) have shown high-quality reconstructions but require dense scans and long acquisition times, often using retroreflective targets for signal boost. Fast algorithms exist but typically assume confocal data and are light-inefficient. Some works demonstrated tracking of small targets or used deep networks trained on synthetic or long-scan data, without low-latency experimental results on diffuse scenes. Passive camera-based methods can infer angular or shadow-based information but lack true 3D reconstructions; thermal NLOS imaging in the infrared enables low latency without reconstruction for specular-like surfaces, and Doppler radar can track objects in real time without full 3D geometry. Recent demonstrations include very long standoff ToF NLOS (over 1.43 km) and theoretical analyses of feature visibility and resolution, as well as PF-based reconstruction showing wave-like properties and prior PF reconstructions. The authors note the need for SPAD arrays tailored to NLOS (fast gating, high temporal resolution, per-pixel readout), which are not met by existing commercial arrays, motivating their custom hardware.
Methodology
Overview: The system integrates custom fast-gated SPAD array detectors, sparse relay-wall scanning, virtual aperture remapping, PF reconstruction via fast Rayleigh–Sommerfeld diffraction (RSD), and depth-dependent frame averaging to achieve live NLOS video. A detailed SNR model accounts for Poisson and ambient noise propagated through the reconstruction operator.
Photon efficiency and reciprocity: Leveraging Helmholtz reciprocity, the authors analyze photon counts for scanning versus array capture. For a single SPAD scanning Q positions, expected photons per time bin scale as γ(t)LΛ/Q. Using a Q-pixel SPAD array capturing all positions simultaneously restores counts to γ(t)LΛ, enabling an inverse proportionality between required laser power and number of SPAD pixels. Increasing photon rate by Q improves SNR by sqrt(Q). Because diffuse hidden-scene returns illuminate the entire relay wall, array detectors are critical to harness most returning photons and reduce laser power or acquisition time.
Signal falloff and SNR modeling: The cumulative NLOS signal from complete measurements does not strictly follow 1/r^4 at close ranges; near-field geometry yields weaker falloff (intuitively approaching 1/r^2 due to extended relay-wall illumination). Noise in NLOS occurs pre-reconstruction and is propagated through the reconstruction operator, making SNR depth-dependent. The authors derive an SNR model (Supplementary Note 3), incorporate PF’s inherent depth-dependent low-pass filtering (constant angular resolution), and apply depth-dependent temporal averaging to stabilize SNR over depth.
Sparse illumination and virtual aperture remapping: The relay wall is scanned with a sparse grid while multiple SPAD pixels observe different wall patches. A remapping operation transforms the sparsely sampled, multi-detector measurements into an equivalent dense virtual illumination grid with a single virtual sensing point, provided spatial shifts are small relative to scene distances and within the temporal jitter tolerance. The absolute round-trip path difference between physical and virtual configurations remains below the effective spatial uncertainty set by temporal jitter (~85 ps ≈ 2.55 cm), validating the approximation. This enables the use of the fast PF-RSD solver designed for dense, single-detector data without dense scans.
Phasor Field reconstruction (RSD implementation): The PF approach treats transient light transport as virtual wave propagation. Data are binned into a Fourier Domain Histogram (FDH) over selected temporal frequencies; an RSD-based diffraction solver performs frequency-domain propagation (FFT, convolution with a precomputed kernel, inverse FFT) to form volumetric reconstructions. PF’s PSF maintains approximately constant angular resolution with depth, acting as a depth-dependent low-pass filter that improves SNR at larger distances compared to backprojection.
Depth-dependent frame averaging: To compensate SNR loss with depth and account for motion blur, the method averages more frames at larger depths where apparent motion is slower and PF’s PSF widens. A target SNR level (set to that at the nearest depth) dictates linear depth-dependent averaging. Coherent averaging is applied to real and imaginary PF components prior to magnitude computation.
Hardware and acquisition: The system uses two custom 16×1 SPAD arrays (total 28 active pixels via TCSPC channel multiplexing), each with ~50 ps FWHM temporal resolution and 200 ns dead time, and fast gating with a 40 ns active window to suppress first-bounce wall returns. The laser is a OneFive Katana HP (532 nm, 35 ps pulse width, 700 mW, 5 MHz repetition). A HydraHarp 400 TCSPC unit in TTTR mode (8 ps time resolution) provides timing; 7 channels are assigned to the arrays. Signals from 4 SPAD pixels are time-multiplexed per channel using cable delays, partitioning each 200 ns repetition window into four segments ([0–40], [40–90], [90–140], [140–180] ns), yielding 7×4=28 pixels. Nikon 50 mm f/1.2 lenses focus the arrays on the relay wall; 532 nm bandpass filters (3 nm FWHM) reduce ambient light. The system standoff to the relay wall is ~2 m.
Scanning and calibration: Two galvo mirrors (max 150 Hz) raster-scan a sparse 190×22 grid on the relay wall (vertical spacing 1 cm, horizontal 9 cm). Full sparse scans run at 5 fps (0.2 s exposure per frame; ~480 μs per laser point). A confocal single-pixel SPAD aligned with the laser is used for geometric calibration of illumination points and SPAD observation areas on the wall.
Reconstruction parameters and constraints: The effective temporal jitter (laser+SPAD) is ~85 ps. The chosen PF virtual wavelength is 8 cm, balancing spatial resolution and stability given the temporal resolution (virtual wavelength must exceed ~2× the spatial blur from jitter). To avoid continuous-wave artifacts, multiple frequencies are processed to synthesize a virtual pulse.
Software pipeline: A multistage producer-consumer pipeline runs in C++ with CPU multithreading and CUDA acceleration: Acquisition (USB3 TTTR records), Parsing (channel demux, grid index, timing corrections), Binning (FDH accumulation using parallelized sine/cosine LUTs), Reconstruction (GPU FFT/convolution/inverse FFT with precomputed RSD kernel; coherent depth-dependent averaging), and Display (normalization, color mapping). The pipeline is acquisition-bound and sustains 5 fps with ~1 s end-to-end latency.
Comparative evaluations: The same scenes are captured with non-confocal arrays and with a confocal single-pixel SPAD (128×128 scan). Non-confocal data are also approximately converted to confocal form to run LCT and FK baselines. Due to galvo limits, 0.2 s confocal scans are infeasible; confocal reconstructions require longer exposures for diffuse targets.
Key Findings
- Live NLOS video at 5 fps with ~1 s latency: The system captures and reconstructs dynamic, diffuse hidden scenes in real time using only 28 SPAD pixels, demonstrating a person manipulating objects (“NLOS letterbox” sequence) with exposure 0.2 s per frame.
- Depth-independent SNR via PF RSD and depth-dependent averaging: Simulations and real data show that while backprojection SNR drops rapidly with distance, PF-RSD’s inherent spatial averaging reduces SNR falloff, and with optimal depth-dependent frame averaging the SNR remains approximately constant until the target falls below the system’s resolution limit.
- Motion blur and resolution scale favorably with depth: PF provides near-constant angular resolution across depth (consistent with the Rayleigh criterion). Depth-dependent averaging leverages slower apparent motion at larger distances, maintaining similar observable motion speeds across depths.
- Virtual aperture remapping enables sparse scanning with array sensing: Remapped measurements closely approximate dense-scan, single-detector data within the temporal jitter tolerance (~85 ps ≈ 2.55 cm), producing reconstructions comparable to full scans while drastically reducing scan time.
- Photon efficiency scaling with array size: Using Q SPAD pixels either reduces needed laser power by ~Q for equivalent histogram SNR or improves SNR by sqrt(Q) at constant power, harnessing light returning to the entire relay wall.
- Hardware performance details: Sparse grid 190×22 over a 1.9 m×1.9 m relay-wall region; vertical spacing 1 cm, horizontal 9 cm; 5 fps; virtual wavelength 8 cm; hidden scene depth range 1–3.5 m (limited by TCSPC window); maximum motion velocity without noticeable blur ~0.4 m/s (from 200 ms exposure and ~8 cm spatial resolution at z≈1 m).
- Baseline comparisons: At short exposures (0.2–4 s), approximate LCT and FK on approximately confocalized non-confocal data appear blurry/noisy, whereas the proposed PF-RSD pipeline reconstructs the diffuse target successfully with 0.2 s exposure. Confocal methods generally require long exposures for diffuse targets and often use retroreflectors; galvo constraints prevent 0.2 s confocal scans in this setup.
- SNR analyses: Real and simulated experiments normalize SNR to its value at 1 m and show RSD SNR falloff is slower than backprojection; with depth-dependent averaging, SNR is roughly depth-invariant up to the resolution limit. Simulations extend to large distances (up to 500 m) demonstrating the trend.
Discussion
The study demonstrates that combining purpose-built SPAD arrays with PF-based fast diffraction reconstruction and virtual aperture remapping overcomes traditional light-efficiency and latency barriers in NLOS imaging. By capturing photons across multiple relay-wall patches simultaneously and remapping them to a virtual dense aperture, the system substantially increases photon usage and reduces required laser power. PF’s constant angular resolution and coherent depth-dependent averaging counteract depth-dependent SNR loss and motion blur, enabling reconstructions with nearly depth-invariant SNR until the resolution limit is reached. These capabilities address the core challenge of fast, robust 3D reconstruction of diffuse, non-retroreflective scenes and support real-world applications (e.g., navigation, search and rescue) where latency and safety (eye-safe powers) are critical. Comparative results indicate that traditional confocal pipelines and backprojection-based methods are disadvantaged at short exposures on diffuse scenes, underscoring the practical benefit of the proposed non-confocal, PF-RSD approach. The findings suggest scalability: increasing SPAD pixel counts and relay wall areas should proportionally improve SNR, speed, and stand-off distance while maintaining manageable computational latency through parallel pipelines and GPU acceleration.
Conclusion
The paper introduces a low-latency NLOS imaging system that achieves live 5 fps reconstructions of dynamic, diffuse hidden scenes using custom fast-gated SPAD arrays, virtual aperture remapping, and PF-based RSD reconstruction with depth-dependent averaging. It contributes: (1) an SNR model for NLOS that accounts for depth, noise propagation, and PF filtering; (2) a virtual remapping method enabling sparse scans with array sensing to emulate dense single-detector data; (3) a real-time hardware–software pipeline that is acquisition-bound and achieves ~1 s latency; and (4) empirical and simulated evidence that PF-RSD SNR is more stable with depth than backprojection and can be made approximately depth-invariant with averaging up to the resolution limit. Future work should scale to kilopixel or megapixel SPAD arrays to further improve SNR and reduce laser power, expand relay-wall apertures to enhance resolution, increase stand-off distances, optimize virtual frequency sets for speed and quality, and integrate more advanced motion handling and adaptive exposure strategies for faster target dynamics.
Limitations
- Depth range limited by hardware timing windows: The active gate (40 ns) and TCSPC time range constrain reconstructions to hidden-scene depths ~1–3.5 m in the reported setup.
- Spatial resolution constrained by temporal jitter: Effective system jitter (~85 ps) imposes a lower bound on the PF virtual wavelength (chosen 8 cm), limiting lateral resolution and maximum tolerable motion speed (~0.4 m/s at 0.2 s exposure).
- Scan speed constraints: Galvo mirror limits (150 Hz) prevent dense confocal scans at 0.2 s and bound the sparse scan rate to 5 fps; faster dynamics may introduce blur without adjusting exposure/averaging.
- Assumptions for remapping validity: Virtual aperture remapping accuracy depends on small spatial shifts relative to scene distances and is valid when path differences remain below the temporal resolution; large deviations could introduce artifacts.
- Dependence on diffuse relay surfaces and targets: The approach assumes near-Lambertian behavior for the relay wall and scene at 532 nm; highly specular or complex BRDFs may affect performance.
- Custom hardware requirement: Results rely on purpose-built fast-gated SPAD arrays and specific TCSPC multiplexing; availability and scalability to commodity sensors may vary.
Related Publications
Explore these studies to deepen your understanding of the subject.