Introduction
Multi-shot computational imaging systems enhance imaging capabilities (super-resolution, phase retrieval, hyperspectral imaging) by computationally combining multiple raw measurements captured under different conditions. However, motion during sequential capture in dynamic scenes leads to blurry reconstructions and artifacts. Existing methods often require static samples, limit sample types, or necessitate impractical hardware modifications, data-efficient algorithms, or deep-learning with data priors which can be difficult to generate and may fail with out-of-distribution samples. This paper proposes a different approach: modeling sample dynamics during image reconstruction. This is challenging because each measurement has different encoding, precluding simple image registration, and motion can be complex and deformable requiring pixel-level motion kernels. The authors leverage deep-learning to develop flexible motion models that are difficult to express analytically, building upon previous deep-learning applications in single-molecule localization microscopy. The proposed neural space-time model (NSTM) recovers dynamic scenes by modeling spatiotemporal relationships in multi-shot imaging reconstruction, exploiting the temporal redundancy of scenes that evolve smoothly over time. NSTM uses two coordinate-based neural networks – one for motion and one for the scene – which store multi-dimensional signals through their weights. The motion network outputs motion kernels estimating pixel-wise displacement, while the scene network generates the scene using motion-adjusted coordinates. Gradient descent minimizes the difference between rendered and acquired measurements to train network weights. A key innovation is a coarse-to-fine process that controls network output granularity, starting with low-frequency features and motion, gradually refining higher-frequency details and local deformable motion, addressing the issue of the scene network overfitting before motion is recovered. NSTM is a general model applicable to systems with differentiable forward models, requiring no pre-training or data priors. The paper demonstrates NSTM's application to differential phase-contrast microscopy (DPC), 3D structured illumination microscopy (SIM), and rolling-shutter DiffuserCam.
Literature Review
The paper reviews existing methods for imaging dynamic samples in multi-shot computational imaging, highlighting their limitations. These methods either try to reduce acquisition time through hardware modifications (e.g., multiplexing measurements), develop more data-efficient reconstruction algorithms, or use data priors with deep learning techniques. However, the authors argue that these methods often present practical limitations, are system-specific, or rely on data priors that are difficult to obtain and can fail for out-of-distribution samples. The authors cite several relevant papers that employ deep learning for super-resolution, image restoration, and single-molecule localization microscopy, emphasizing the challenges and successes of applying deep learning to dynamic imaging. Specific examples of prior work include methods utilizing deep learning for super-resolution microscopy (Huang et al., 2009; Gustafsson, 2008), quantitative phase imaging (Park et al., 2018), hyperspectral imaging (Lu & Fei, 2014), and single-molecule localization microscopy (Dertinger et al., 2009; Nehme et al., 2018; Saguy et al., 2023). The review emphasizes the novelty of their approach, which addresses the limitations of existing methods by explicitly modeling the sample's dynamics during the reconstruction process, rather than relying solely on faster acquisition or data priors.
Methodology
The core of the methodology is the neural space-time model (NSTM), which consists of two interconnected coordinate-based neural networks: a motion network and a scene network. The motion network takes spatiotemporal coordinates as input and outputs a motion kernel representing pixel-wise displacement. This kernel is then used to adjust the spatial coordinates before they are fed into the scene network. The scene network, using the adjusted coordinates, generates the reconstructed scene. The weights of both networks are jointly optimized using gradient descent, minimizing the difference between the rendered measurements (produced by passing the reconstructed scene through a forward model representing the imaging system) and the actual acquired measurements. A crucial aspect of the methodology is the coarse-to-fine process. This process controls the granularity of the network outputs during training, starting by recovering low-frequency features and motion and progressively refining higher-frequency details and local deformable motion. This approach mitigates the issue of the scene network overfitting to the measurements before the motion is accurately recovered. The forward model is specific to the imaging system and is included as part of the loss function during training. The paper demonstrates NSTM's application to three different imaging systems: DPC, 3D SIM, and rolling-shutter DiffuserCam. For DPC, the forward model involves linear transfer functions derived from previous work; for 3D SIM, a real-space forward model is implemented to avoid the static scene assumption of conventional band-separation methods; and for rolling-shutter DiffuserCam, a discrete-time sum is used to model the time integral of the scene convolved with the point spread function. The paper describes the specific implementation details, including network architecture (number of layers and width), optimizer (Adam), and learning rate scheduling. For 3D SIM, an efficiency improvement is introduced by grouping measurements with identical orientation and phase across different depth planes, reducing computational cost. The conventional reconstruction methods for comparison are also specified, including Tikhonov regularization for DPC and a moving window approach for 3D SIM. The code and data are made available for reproducibility.
Key Findings
The NSTM demonstrates significant improvements in reconstructing dynamic scenes from multi-shot imaging data compared to conventional methods. In differential phase-contrast microscopy (DPC) of a moving *C. elegans* worm, NSTM successfully removes motion artifacts and produces a clear reconstruction, along with an estimate of the worm's motion dynamics. In 3D structured illumination microscopy (3D SIM) of a dense microbead sample with induced vibration, NSTM resolves individual beads that were blurred in the conventional reconstruction, achieving a quality comparable to a ground truth reconstruction from a separate static acquisition. The application of NSTM to live-cell 3D SIM imaging reveals its power in disambiguating motion artifacts and recovering fine details that are blurred in conventional reconstructions. For instance, NSTM distinguishes a moving tubule from a branched mitochondrion in live cell imaging, a distinction missed by conventional methods. Furthermore, NSTM successfully recovers the dynamics of the endoplasmic reticulum (ER) network in live cells, resolving clear structures not visible in conventional reconstructions. In rolling-shutter DiffuserCam lensless imaging, NSTM produces cleaner reconstructions of a dynamic scene than the original total variation regularization method, without over-smoothing. NSTM provides superior performance in quantitative metrics such as PSNR and SSIM in various simulated scenarios, demonstrating its robustness to different types and magnitudes of motion. Simulation studies reveal NSTM's strengths and limitations: it handles large magnitudes of rigid-body or linear motion effectively, while performance degrades with large magnitudes of local deformable motion, and high-frequency vibrations are challenging due to the lack of temporal redundancy. The ability to query the motion network directly to generate temporally interpolated videos at arbitrary resolutions is another key finding.
Discussion
The results demonstrate the effectiveness of NSTM in addressing the long-standing challenge of motion artifacts in multi-shot computational imaging. NSTM's ability to jointly reconstruct the scene and its motion dynamics from the same dataset used for conventional methods is a significant advance. The absence of pre-training or data priors makes NSTM adaptable and generalizable to various multi-shot imaging systems. This is particularly valuable in biological imaging, where ground truth data may be unavailable or difficult to acquire. The coarse-to-fine approach effectively addresses the challenges associated with the joint optimization of motion and scene, leading to improved convergence and reconstruction quality. The findings highlight the potential of NSTM to enhance the temporal resolution of multi-shot systems, allowing for detailed observation of dynamic processes in living systems. The ability to recover motion maps offers additional insights into sample dynamics, providing a more complete understanding of the imaged scene.
Conclusion
The neural space-time model (NSTM) presented offers a significant improvement in handling motion artifacts in multi-shot computational imaging. Its ability to jointly estimate scene and motion dynamics without pre-training or data priors makes it a versatile tool applicable to a range of imaging modalities. Future research directions could focus on extending NSTM to handle more complex dynamics (e.g., appearing/disappearing features), improving computational efficiency, and exploring its application to other imaging techniques. The successful application of NSTM across diverse imaging modalities points towards its potential to become a standard method for high-quality dynamic imaging.
Limitations
The NSTM's reliance on temporal redundancy (smooth motion and correlatable scenes) limits its applicability to scenarios with less smooth or highly discontinuous motion, such as those with appearing/disappearing features or abrupt changes. The two-network construction, while providing an explicit motion model, imposes a constraint that hinders the recovery of dynamics with appearing/disappearing features. The computational cost of NSTM is currently higher than conventional methods, although improvements are possible through optimization techniques. Furthermore, the accuracy of NSTM reconstructions is affected by the level of noise in the acquired measurements.
Related Publications
Explore these studies to deepen your understanding of the subject.