logo
ResearchBunny Logo
Video frame interpolation neural network for 3D tomography across different length scales

Engineering and Technology

Video frame interpolation neural network for 3D tomography across different length scales

L. Gambini, C. Gabbett, et al.

This innovative research by Laura Gambini and colleagues from Trinity College Dublin delves into using a neural network for video-frame interpolation, ultimately enhancing the resolution of tomographic images. By applying this method to different imaging modalities, the team has achieved outstanding results in optimizing 3D tomography acquisition.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the challenge of limited and anisotropic resolution in 3D tomography across scales, where acquisition constraints (e.g., dose, speed, destructive sampling) hinder cubic-voxel reconstructions and the ability to extract precise physical information. The authors ask whether image augmentation via neural video-frame interpolation can enhance through-plane resolution to recover cubic voxels, preserve information content, and relax acquisition constraints (e.g., fewer destructive FIB-SEM slices, reduced CT dose, shorter MRI scans). They motivate the problem with FIB-SEM nanotomography, where xy resolution (~5 nm) exceeds the z-direction slice thickness (10–20 nm), leading to anisotropic voxels and slow, potentially damaging acquisitions. They propose reframing the z-direction as a temporal axis and applying motion-aware frame interpolation to insert intermediate slices, thereby improving isotropy while maintaining the morphology necessary for accurate physical metrics.
Literature Review
Conventional linear interpolation provides acceptable results only when structural variation between adjacent slices is very smooth; otherwise, it blurs edges and causes information loss. Optical-flow-based slice interpolation improves on linear methods but performs poorly at image borders, forcing cropping and loss of information. Deep-learning approaches for anisotropic 3D super-resolution have been proposed: training cross-sectional models to super-resolve along the milling direction assumes morphological equivalence across axes and can introduce bias; GAN-based 2D/3D fusion can generate plausible volumes but lacks uniqueness and can be ambiguous for quantitative reconstruction. In contrast, video-frame interpolation with deep learning is designed to reconstruct temporal evolution and improve perceptual quality; state-of-the-art Real-Time Intermediate Flow Estimation (RIFE) achieves strong accuracy and speed without auxiliary networks. The authors build on this by applying RIFE to tomographic stacks, anticipating that the model’s known limitations (determined by available frame information and motion complexity) transfer to tomography use cases.
Methodology
Overview: Treat the tomographic through-plane (milling or slice) direction as the temporal axis and apply a video-frame interpolation network (RIFE) to insert intermediate slices, aiming for cubic-voxel resolution while preserving morphological information. Datasets: - FIB-SEM printed graphene networks: 801 images (4041 × 510 px) at 5 nm in-plane resolution, 15 nm slice thickness (voxel 5×5×15 nm^3). A 100-image subset cropped to 510 × 510 px used for computer-vision metrics and porosity; ten randomly selected sub-volumes (55–60% of original volume) used for tortuosity and effective diffusivity. - MRI brain scans (sagittal): publicly available (Brainstorm), 1 mm^3 isotropic voxels. Every other slice removed to provide ground truth comparison. - CT torso dataset (TCIA): 152 axial frames; pixel size 0.74 × 0.74 × 2.49 mm; no ground truth available. Interpolation models and settings: - Primary: RIFE HD v4.6 (IFNet architecture, coarse-to-fine intermediate flow estimation, end-to-end learned flow, trained on Vimeo90K). Inference adapted to accept sequences of arbitrary length and grayscale input. Also evaluated a fine-tuned RIFE (non-HD) model trained on 1000 cropped graphene images (510 × 510 px) using Quadro RTX 8000 GPUs. - Baselines: DAIN (depth-aware interpolation), IsoFLOW (optical-flow based inter-slice method), and linear interpolation. Evaluation protocols: - Slice removal experiments (FIB-SEM): remove 1, 3, or 7 consecutive frames, interpolate them, and compare against ground truth over 100 images. For some tests, generate three additional frames between every two frames. - Segmentation and binarization (FIB-SEM): trainable WEKA segmentation used to produce pore vs. nanosheet probability maps from manually labeled pixels; threshold via FIJI ISODATA to obtain binary masks. A single trained classifier and threshold applied consistently across all methods. - Computer-vision metrics (FIB-SEM): Mean Squared Error (MSE) and Structural Similarity Index (SSIM) computed per reconstructed frame vs. ground truth, averaged over 100 images. - Morphological metrics (FIB-SEM): Porosity P per frame computed from binary masks; report ΔP = 100% · |P_m − P_GT| / P_GT averaged over frames. For transport-relevant properties, compute tortuosity τ and effective diffusivity D_eff (with D_eff = D^ε / τ) using TAUFACTOR (GPU-accelerated Python version) on ten randomly selected volumes (55–60% of full), assessing Δτ and ΔD_eff vs. ground truth. - Feature-size dependence (FIB-SEM): repeat porosity analysis for networks with different average nanosheet lengths (695 nm, 298 nm, 80 nm) at three replaced frames to probe sensitivity to rapid inter-slice changes. - MRI: after removing every other slice, reconstruct with RIFE HD and perform full-brain segmentation using BrainSuite’s Anatomical pipeline on both original and reconstructed volumes; compute gray-matter volume variation (GMV = P_gray / (P_gray + P_white)), and report percent difference. - CT: generate three additional frames between each pair of original slices to approach cubic voxels; assess image quality via noise power spectrum in the coronal plane over uniform ROIs from stacks of ten images (no ground truth available), comparing original vs. RIFE-augmented data. Computing: Experiments run on Nvidia Quadro RTX 8000 GPUs; tortuosity computations performed on the same GPUs. Code adapted from RIFE repositories with minor modifications and provided in an open repository.
Key Findings
- Visual fidelity (FIB-SEM): RIFE (HD and fine-tuned) preserves sharpness across the field of view without systematic failures; linear interpolation blurs edges; DAIN over-smooths borders; IsoFLOW performs well centrally but fails at borders, necessitating cropping. - Computer-vision metrics (FIB-SEM): Across 1, 3, 7 removed frames, RIFE variants consistently achieve the best MSE/SSIM; all methods degrade with more removed frames; IsoFLOW appears competitive on these metrics but does not preserve information at borders. - Porosity accuracy (FIB-SEM): ΔP advantage for RIFE is clear. With one removed frame, all but linear interpolation are close to ground truth; with three removed frames, RIFE significantly outperforms others; with seven removed frames, RIFE error remains below 2% on average. IsoFLOW’s border errors degrade porosity despite favorable MSE/SSIM; linear interpolation struggles due to inability to preserve sharp pore–nanosheet boundaries. - Transport metrics (FIB-SEM): For three replaced frames, RIFE maintains Δτ and ΔD_eff below 2% (averaged over volumes); DAIN exceeds 10% error already for one replaced frame, so further testing was not pursued. - Feature-size dependence: Performance degrades as average nanosheet length decreases (more abrupt inter-slice changes), yet with three replaced frames the porosity error remains below ~2% even for L = 80 nm (with ~50 nm equivalent milling distance). Seven replaced frames correspond to ~100 nm effective milling thickness with acceptable errors in earlier tests. - MRI brain: Removing every other slice and reconstructing with RIFE HD yields a GMV percentage difference of 0.5% between original and reconstructed volumes after BrainSuite segmentation, indicating preserved morphometric information and suggesting potential scan-time reductions. - CT torso: Adding three frames between each pair of slices reduces the noise power spectrum across the frequency range in uniform regions, consistent with visually smoother images; demonstrates potential for dose/time reduction though no ground truth was available. - General: Motion-aware interpolation (RIFE) outperforms linear and optical-flow baselines, particularly avoiding edge blurring and boundary failures, and maintains morphological observables within ~2% error under practical sampling reductions (e.g., milling thickness less than ~half the nanosheet length).
Discussion
The study demonstrates that repurposing video-frame interpolation to the through-plane direction of tomographic stacks can recover near-cubic voxel sampling while preserving quantitative information critical for materials and medical imaging. In FIB-SEM, RIFE’s coarse-to-fine flow estimation maintains edges and details, translating into accurate porosity and transport properties even when multiple slices are missing, thereby enabling fewer destructive milling steps with minimal impact on morphology. The dependence on feature size clarifies operating limits: when milling thickness remains below roughly half the characteristic feature length, morphological errors remain under ~2%, defining practical acquisition regimes that mitigate damage and reduce time. In MRI, near-identical morphometrics (0.5% GMV difference) when halving slices suggests scan-time reductions are feasible without compromising key measurements. For CT, the reduced noise power spectrum after augmentation indicates improved image quality that could support dose reduction. Together, the results validate that motion-aware interpolation better preserves information than linear and hybrid optical-flow methods, especially at boundaries, and can be deployed across scales without retraining in many cases.
Conclusion
Applying a state-of-the-art video-frame interpolation model (RIFE) to tomographic stacks enhances through-plane sampling toward cubic voxels while preserving key information. On FIB-SEM graphene networks, RIFE outperforms linear, optical-flow, and depth-aware baselines, maintaining porosity, tortuosity, and effective diffusivity within ~2% errors under substantial slice removal, and remaining robust across feature sizes when the sampling interval is less than about half the feature length. In MRI, it preserves brain morphometrics with a 0.5% GMV difference after halving slices; in CT, it reduces noise power across frequencies after augmentation. These findings suggest practical pathways to reduce destructive milling steps, acquisition time, and potentially radiation dose while retaining information content. Future work should integrate clinical protocols to quantify decision-making benefits, explore domain-specific fine-tuning across varied modalities and specimens, and leverage next-generation interpolation architectures to further improve fidelity and operational limits.
Limitations
- Medical datasets lacked ground truth for CT (and only partial validation for MRI), limiting information-content assessment to computer-vision metrics and noise analysis rather than direct quantitative comparisons. - Video-frame interpolation performance is constrained by the information available in adjacent frames; abrupt inter-slice changes (e.g., very short features relative to slice spacing) can degrade results. - Segmentation-based metrics (porosity, etc.) depend on user-guided WEKA training and thresholding choices, though a single classifier/threshold was applied consistently across methods. - Tortuosity/diffusivity computations require sufficiently large sample volumes for representativeness and are computationally intensive; only top-performing methods (RIFE HD and DAIN) were evaluated in this step. - IsoFLOW’s border failures highlight sensitivity to boundary conditions; cropping can remove valuable information. - Generative models (e.g., GANs) were avoided due to solution ambiguity; conclusions pertain to interpolation methods evaluated. - The demonstrated performance bounds (e.g., errors <2% when milling thickness < ~half feature length) are dataset-specific and may vary with other materials, modalities, and acquisition protocols; broader validation is needed. - Clinical utility and regulatory readiness require practitioner studies and standardized evaluation protocols not yet established.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny