logo
ResearchBunny Logo
Introduction
Non-line-of-sight (NLOS) imaging, the ability to image objects hidden from direct view, has significant potential applications in autonomous navigation, disaster response, and various other fields. While time-of-flight (ToF) methods have shown promise in 3D reconstruction of NLOS scenes, challenges remain in achieving high speed and reconstruction quality due to the inherently weak multibounce signals. Existing ToF NLOS methods often use confocal scanning with single-pixel sensors, resulting in low light efficiency and long acquisition times, typically in the range of minutes or tens of seconds. Live reconstructions have mostly been demonstrated using retroreflective surfaces, which provide significantly stronger signals than diffuse surfaces encountered in real-world scenarios. Other approaches, such as those employing deep neural networks or passive methods using shadows or indirect reflections, have limitations in 3D reconstruction capabilities or are restricted to specific scene types. This research addresses the need for a method that can achieve high-speed, high-quality 3D NLOS imaging of diffuse objects in complex scenes. The authors propose a solution that leverages the advantages of multipixel SPAD array detectors to efficiently capture returning photons and combines this with a novel fast reconstruction algorithm to overcome the SNR limitations of NLOS imaging and enable the creation of low-latency videos.
Literature Review
The authors review existing NLOS imaging techniques, highlighting their limitations. Time-of-flight (ToF) methods are noted as the most successful for reconstructing near room-sized scenes, but their reliance on confocal scanning with single-pixel sensors results in slow acquisition times (minutes to tens of seconds). Recent advancements have demonstrated ToF NLOS imaging at long distances (1.43 km), showcasing the technique's potential, but these techniques still struggle with real-time, high quality 3D reconstructions in everyday environments. The review also covers other methods, including those using deep neural networks, passive approaches exploiting shadows or indirect reflections, and thermal imaging in the infrared. These alternatives either lack the ability to produce full 3D reconstructions, are limited to specific scene types (e.g., active monitors), or operate in different spectral ranges with different limitations (e.g., thermal emission requirements). The use of Single-Photon Avalanche Diodes (SPADs) for NLOS imaging is discussed, emphasizing the efficiency gains of using SPAD arrays to capture the light returning from the entire relay wall, unlike the single-pixel sensor approach. This difference in efficiency is compared to conventional cameras, where array detectors are standard practice to optimize the captured signal.
Methodology
This research uses specially designed, fast-gated SPAD array detectors and a novel reconstruction algorithm based on the phasor field framework to address the signal-to-noise ratio (SNR) limitations of NLOS imaging. The core innovation lies in using multipixel SPAD arrays (two 16x1 arrays in this study) which capture photons from multiple locations simultaneously, improving light efficiency compared to single-pixel scanning. A sparse illumination scanning pattern is employed to reduce scanning time, coupled with a remapping operation that creates a virtual complete illumination grid and allows the application of a fast Rayleigh-Sommerfeld Diffraction (RSD) algorithm for reconstruction. This RSD algorithm, optimized for speed, is crucial for real-time processing. The authors derive a detailed SNR model that considers depth-dependent intensity falloff and noise propagation. This model informs the design of a depth-dependent frame averaging technique, which effectively increases SNR at larger distances without introducing significant motion blur. The reconstruction process involves capturing raw photon streams using the SPAD array, remapping the data to a virtual full aperture, transforming to the frequency domain (Fourier Domain Histogram, FDH), applying the fast RSD algorithm, and performing the depth-dependent frame averaging to obtain the final image. This entire pipeline is designed for real-time processing. The experimental setup employs a pulsed laser (532 nm, 700 mW, 5 MHz repetition rate) and two custom-designed 16x1 SPAD arrays with high time resolution (50 ps FWHM) and individual pixel readout. The hardware integrates with a time-correlated single photon counting (TCSPC) unit and galvanometer mirrors for scanning the relay wall. Software implementation utilizes a multi-stage producer-consumer model with multiple CPU threads and GPU acceleration to handle the real-time data processing and image reconstruction.
Key Findings
The study demonstrates the effectiveness of the proposed method through both simulations and real-world experiments. The SNR model predicts and the experiments confirm that the SNR remains relatively constant across a wide range of depths. Depth-dependent frame averaging is shown to significantly improve SNR at longer distances compared to backprojection methods. The proposed system achieves real-time NLOS video capture and reconstruction at 5 frames per second (fps) with a latency of approximately 1 second. The authors present live NLOS videos of dynamic scenes featuring non-retroreflective objects, demonstrating the system's capability to capture complex movements. The spatial resolution of the reconstructions is shown to be consistent with the theoretical expectations, indicating that the resolution does not degrade drastically with distance, unlike methods using conventional backprojection. A comparison with other NLOS methods using confocal scanning demonstrates the significant speed advantage of the proposed method. Confocal methods require significantly longer exposure times (seconds to minutes) to obtain comparable image quality, making them unsuitable for dynamic scene capture. The experiments with both diffuse and retroreflective objects highlighted the higher signal from retroreflective surfaces (approximately 10000 times stronger), emphasizing the practical value of the proposed method that handles the inherently weak signals from diffuse objects.
Discussion
The results demonstrate a substantial advancement in NLOS imaging capabilities. The combination of specially designed SPAD arrays and efficient reconstruction algorithms overcomes previous limitations on speed and the size of reconstructable scenes. The depth-independent SNR and motion blur achieved by the depth-dependent frame averaging is a critical improvement, suggesting scalability to much larger scenes and standoff distances. The real-time video demonstrations highlight the practical potential of the technology for applications requiring dynamic scene capture, such as robotics, autonomous navigation, and disaster response. The authors anticipate that future improvements in CMOS SPAD array technology (kilopixel and megapixel arrays) will further enhance the SNR, speed, and standoff distance of NLOS imaging systems.
Conclusion
This paper presents a significant advancement in NLOS imaging by combining custom-designed SPAD array detectors with a novel, fast reconstruction algorithm. The real-time (5 fps) video results with non-retroreflective objects demonstrate the method's feasibility for dynamic scene capture and various applications. The depth-independent SNR and resolution, achieved through depth-dependent frame averaging, highlight the scalability of the approach for larger scenes and greater standoff distances. Future research may explore improvements using higher-density SPAD arrays and further refinement of the reconstruction algorithms.
Limitations
The current system's spatial resolution is limited by the temporal resolution of the SPAD arrays and the virtual wavelength used in the Phasor Field reconstruction. The relatively small size of the SPAD array currently restricts the field of view. While the depth-dependent frame averaging mitigates the SNR reduction at larger distances, it introduces some blurring into the reconstruction. The current implementation might not be well-suited for scenes containing extremely rapid movements.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny