Video frame interpolation neural network for 3D tomography | ResearchBunny

Index

Introduction

Three-dimensional (3D) tomography is a crucial technique in various scientific fields, from materials science to medicine. However, limitations in resolution, often spatially anisotropic, hinder the precision of retrievable information. Acquisition time constraints, destructive nature of some techniques, and economic factors all contribute to this resolution limitation. For instance, in CT scans, balancing the need for detail with minimizing radiation exposure to patients presents a challenge. In FIB-SEM nanotomography (FIB-SEM-NT), the anisotropic resolution stems from the difference between the high cross-sectional resolution achieved by the SEM and the coarser resolution in the milling (z) direction due to slice thickness. Reducing slice thickness is limited by instrumentation, sample fragility, and cost, while also potentially negatively impacting xy-plane resolution. This study addresses this limitation by proposing an image augmentation method leveraging a neural network designed for video-frame interpolation to enhance the resolution and information content of tomographic data. The goal is to improve image quality, extract more precise information, and potentially alter experimental parameters (e.g., reduce the number of FIB-SEM milling steps or CT scan slices). Linear interpolation, while simple, often blurs features; methods involving optical flow improve this but struggle at image borders. Deep-learning-based methods like those employing super-resolution or GANs have shown promise, but either have limitations on applicability or introduce ambiguity due to the nature of the generative model. This work proposes the use of Real-Time Intermediate Flow Estimation (RIFE), a state-of-the-art video-frame interpolation algorithm, to address these limitations.

Literature Review

Existing methods for improving the resolution of 3D tomographic images have their limitations. Linear interpolation, while straightforward, suffers from blurring of feature edges and inaccuracies when structural variations are not perfectly smooth. Optical flow-based approaches, while attempting to account for changes in features, also struggle, particularly at image borders. Previous deep learning approaches focused on super-resolution techniques have been applied to FIB-SEM data; however, these methods often rely on assumptions about morphological uniformity across all directions, leading to biased reconstructions. Generative adversarial networks (GANs) offer another avenue, but their inherent ambiguity in generating unique solutions is a significant drawback for quantitative analysis. This work explores the novel use of video frame interpolation, specifically the RIFE algorithm, as a solution. RIFE is known for its ability to generate smooth, visually realistic intermediate frames in videos, implying a potential to both enhance and reconstruct image sequences efficiently. The researchers anticipated the inherent limitations of this approach, which have been noted in the original RIFE paper, and have investigated how these limitations affect the application across different types of datasets.

Methodology

The study employs the RIFE algorithm, a deep learning model trained for video-frame interpolation, to enhance 3D tomographic images. The method treats the milling direction in FIB-SEM or the sequence of slices in other tomography methods as the temporal dimension of a video. The RIFE algorithm is applied to three different datasets representing different length scales: (1) FIB-SEM images of printed graphene nanosheet networks (5 nm xy-resolution, 15 nm z-resolution), (2) MRI scans of the human brain (1mm³ resolution), and (3) X-ray CT scans of the abdomen (0.74 × 0.74 × 2.49 mm³ resolution). For the FIB-SEM dataset, a subset of 100 images (510 × 510 pixels) was used for the computer-vision metrics and porosity analysis, while ten randomly selected volumes (55-60% of the original volume) were used for tortuosity and effective diffusivity calculations. The images are processed using various interpolation methods: RIFE (original and fine-tuned versions), DAIN, IsoFLOW, and linear interpolation. For quantitative assessment, the images are binarized using the WEKA segmentation tool, allowing for the calculation of porosity (P), tortuosity (τ), and effective diffusivity (Deff). Computer vision metrics, Mean Square Error (MSE) and Structural Similarity Index Method (SSIM), are also used. For the MRI dataset, frames were removed and used as a ground truth for comparison, allowing calculation of gray matter volume variation (GMV). For the CT dataset, where ground truth was not available, a noise power spectrum analysis is employed to assess the impact of the interpolation.

Key Findings

The RIFE model consistently outperformed other interpolation methods across all three datasets. In the FIB-SEM data, the RIFE algorithm effectively interpolated images, maintaining sharpness and avoiding the blurring or border artifacts seen with other methods. Quantitative analysis revealed that errors in porosity, tortuosity, and effective diffusivity remained below 2% even when up to seven consecutive frames were removed and replaced with RIFE-generated frames, provided that the milling thickness was less than approximately half the average nanosheet length. In the MRI data, using RIFE to create intermediate frames resulted in a minimal error (0.5%) in the calculated gray matter volume, demonstrating the preservation of crucial information even with significant frame reduction. In the CT data, the RIFE-augmented images displayed a reduction in noise across the entire frequency range, indicative of improved image quality. Furthermore, the model’s performance remained robust even with shorter graphene nanosheets (down to 80nm average length), although the error in porosity naturally increased as the feature size decreased; even at this size, the error remained under 2%.

Discussion

The results demonstrate the effectiveness of RIFE, a video frame interpolation neural network, in enhancing the resolution of 3D tomographic images across a wide range of length scales and imaging modalities. This application is novel, extending the use of a pre-trained model designed for videos to improve 3D tomography, without requiring additional training or fine-tuning for specific datasets. The success in preserving relevant physical quantities (porosity, tortuosity, and diffusivity) in the FIB-SEM data validates the accuracy of the method. The minimal error observed in the MRI data suggests potential for increased scan speed without compromising information content. While ground truth was unavailable for the CT data, improvements in the noise power spectrum indicate the potential for significant benefit, such as reduced radiation exposure to patients through lower scan rates. The ability of this approach to handle varied datasets suggests broad utility in various fields and imaging modalities.

Conclusion

This study successfully demonstrates the application of a video-frame interpolation neural network (RIFE) for enhancing the resolution of 3D tomography data. The approach proves effective across different length scales and imaging techniques, improving image quality and enabling more precise information extraction. Future research could focus on further optimizing the model for specific applications, developing quantitative metrics for evaluating information preservation in medical contexts, and exploring the integration of this technique into clinical workflows. The potential benefits in terms of reduced radiation doses and improved image quality make this a promising approach for advancing 3D tomography.

Limitations

The study's primary limitation is the lack of ground truth data for the X-ray CT analysis, hindering a comprehensive quantitative assessment of the method's accuracy for medical imaging data. Future work will involve evaluating the clinical impact of this image augmentation technique through collaboration with medical experts to establish a robust protocol for evaluating the usefulness of the enhanced images. While the RIFE model showed robustness even with smaller features, a further investigation into the optimal milling distance as a function of feature size could refine its usage. The generalizability of the method to other complex datasets would also need further testing.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Self-assembly of sustainable plant protein protofilaments into a hydrogel for ultra-low friction across length scales

O. Pabois, Y. Dong, et al.

Engineering and Technology

An optical neural chip for implementing complex-valued neural network

H. Zhang, M. Gu, et al.

Business

Disequilibrium and complexity across scales: a patch-dynamics framework for organizational ecology

J. Xu and J. Cornelissen

Medicine and Health

Ultrathin monolithic 3D printed optical coherence tomography endoscopy for preclinical and clinical use

J. Li, S. Thiele, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny