Medicine and Health

Non-Reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs

K. V. E. Risager, T. Gholamalizadeh, et al.

Discover groundbreaking research by Karl Van Eeden Risager, Torkan Gholamalizadeh, and Mostafa Mehdipour Ghazi that transforms the assessment of brain MRI quality using a novel, non-reference deep learning approach. This innovative technique evaluates artifacts in MRI scans and generates high-fidelity synthetic 3D images, providing crucial insights for enhancing medical imaging standards.

00:00

Playback language: English

Index

Introduction

The generation of synthetic brain MRI images using deep learning offers solutions to critical challenges in medical imaging, such as the scarcity of labeled data and the need for domain adaptation. Synthetic images enhance the robustness and generalizability of deep learning models across various clinical settings, which can differ in imaging protocols, scanner types, and patient populations. Furthermore, synthetic data helps mitigate privacy concerns by providing anonymized data that complies with data-sharing regulations in medical research. High-quality synthetic images are essential for the reliability of deep learning tools in clinical diagnosis and treatment planning. Recent advancements in deep generative models, particularly denoising diffusion probabilistic models (DDPMs), have demonstrated the potential to create realistic synthetic images, surpassing traditional methods like GANs and VAEs, especially when dealing with high-resolution 3D brain MRIs. The 3D wavelet diffusion model (WDM) has shown promise in handling high-dimensional data, producing results that outperform state-of-the-art techniques. However, a significant limitation persists: the lack of comprehensive quality assessment methods specifically tailored for these generated images. This is crucial for ensuring the clinical usability and trustworthiness of synthetic data. Existing image quality metrics, such as SSIM and PSNR, require reference images, which presents a challenge for generative models that don't have a direct one-to-one correspondence between real and synthetic images. Metrics like FID evaluate similarity based on groups of images, rather than individual image quality. Traditional non-reference metrics designed for 2D natural images, like BRISQUE, PIQE, NIMA, and GIQA, are less effective in the complex context of medical imaging, failing to capture specific artifacts inherent in MRI modalities such as motion artifacts, bias fields, and complex noise. Therefore, there's a clear need for specialized quality assessment methods in medical imaging that address domain-specific variations and guarantee high-quality synthetic images.

Literature Review

The paper reviews existing image quality assessment (IQA) methods, highlighting their limitations when applied to 3D medical images, particularly brain MRIs. Traditional methods like SSIM and PSNR rely on reference images, making them unsuitable for evaluating generative models. Metrics like FID assess image quality based on group comparisons, not individual images. Methods designed for 2D natural images (BRISQUE, PIQE, NIMA, GIQA) are inadequate for the complexities of medical images and fail to address domain-specific artifacts like motion artifacts, bias fields, and Rician noise. The lack of a comprehensive non-reference IQA method for 3D medical images, specifically MRIs, motivates the proposed research.

Methodology

This study proposes a novel non-reference quality assessment method for 3D brain MRIs using a deep learning approach. The methodology involves two key components: a generative network and a quality network. **Generative Network:** A Wavelet Diffusion Model (WDM) is employed to generate high-quality synthetic 3D brain MRIs. The WDM utilizes a discrete wavelet transform (DWT) to decompose input images into wavelet coefficients, which are then processed by a denoising diffusion probabilistic model. After denoising, an inverse DWT reconstructs the synthetic image at full resolution. **Quality Network:** A 3D ResNet-50 architecture is trained as a regression network to assess image quality based on six distinct artifacts: contrast change, bias field, Gibbs ringing, motion ghosting, Rician noise, and blur. The network takes a 3D brain MRI as input and outputs six quality scores (one for each artifact), each ranging from 0 to 1 (0 being the lowest and 1 being the highest quality). To simulate these artifacts during training, dynamic data augmentation techniques are applied to high-quality reference images. These augmentations include gamma transforms for contrast, elliptic gradient fields for bias field, k-space truncation for ringing, k-space line weighting for ghosting, Gaussian noise addition for Rician noise, and resampling/Gaussian smoothing for blur. The network is trained using a focal MSE loss function, which addresses the imbalance between high- and low-quality scores. **Data Augmentation and Inference:** To enhance robustness and generalization, several data augmentation techniques are applied on the fly during both training and inference. These include random translations, rotations, flipping, elastic deformations, and skull stripping. The final quality score for each artifact is obtained by averaging the predictions from the original and flipped images. **Datasets:** Multiple public datasets (ADNI, OASIS, Hammers, IBSR, IXI, SynthRAD, BraTS) are used for training, validation, testing, and evaluation. Images are preprocessed to ensure standardization (intensity normalization, resampling to 1mm isotropic resolution, padding/cropping to 224x224x224).

Key Findings

The study's key findings demonstrate the superior performance of the proposed non-reference quality assessment method compared to state-of-the-art metrics. The 3D ResNet-50 network accurately estimates image quality across various datasets and different artifact types, even generalizing well to unseen data. The results are presented in several tables: * **Table 1:** Shows the comparison of predicted and ground truth quality scores for various distortions on the ADNI test set. The network exhibits low mean squared error (MSE), indicating accurate estimation. Discrepancies are observed mainly for blur, potentially due to the similarity between interpolation/smoothing artifacts and other distortions. * **Table 2:** Compares the proposed method with other IQA metrics (PIQE, SSIM, BRISQUE, PSNR) on several real datasets (ADNI, IXI, OASIS, Hammers, IBSR). The proposed method consistently outperforms others, especially for distorted images. The limitations of SSIM and PSNR (requiring reference images) and BRISQUE (unclear score range) are highlighted. * **Table 3:** Compares the proposed method with other IQA metrics on synthetic datasets generated by the WDM. The proposed scores correlate well with SSIM, BRISQUE, PSNR, and FID, but offer better interpretability and a clear 0-1 range. * **Table 4:** Analyzes the quality of generated images from different datasets for each distortion type. It reveals varying levels of quality issues, with contrast being a prevalent problem across datasets. Noise is more prominent in ADNI-generated images, while blur and ringing effects are observed in BraTS and SynthRAD-generated images respectively. This suggests avenues for improving generative models by adjusting loss functions to address specific artifacts. Qualitative assessment of low-quality images (Figure 1) further supports the accuracy of the proposed method.

Discussion

The proposed non-reference quality assessment method addresses the critical need for a robust and comprehensive approach to evaluating 3D brain MRI image quality, particularly in the context of synthetic image generation. The superior performance demonstrated across various datasets and artifact types showcases its effectiveness and generalizability. The method's ability to operate without reference images is a significant advantage, making it applicable to a wider range of scenarios, including the evaluation of deep generative models. The intuitive 0-1 scoring system enhances interpretability and facilitates comparison across heterogeneous datasets. The detailed analysis of specific artifacts in generated images provides valuable insights for improving the quality of synthetic data generated by deep learning models. This work contributes significantly to the field by providing a tool for assessing the quality of 3D medical images, which can ultimately improve the accuracy and reliability of medical image analysis and clinical applications.

Conclusion

This study presents a novel, comprehensive, and non-reference-based approach for assessing the quality of 3D brain MRI images. The proposed method, based on a 3D ResNet, accurately estimates image quality concerning six common artifacts and generalizes well across multiple datasets. Its superior performance compared to state-of-the-art metrics, along with its interpretability and ability to work without reference images, makes it a significant contribution to the field of medical image quality assessment. Future research could explore the application of this method to other medical imaging modalities and the development of more sophisticated generative models guided by the insights provided by this quality assessment framework.

Limitations

While the proposed method demonstrates superior performance, some limitations should be acknowledged. The training of the 3D ResNet network relied heavily on the ADNI dataset. While generalization to other datasets was successful, further investigation with a more diverse and larger training dataset could potentially enhance performance and robustness. The focus on six specific artifacts might overlook other potential image quality issues. Future work could include expanding the range of artifacts considered and exploring more diverse augmentation strategies. The computational cost of processing 3D images remains relatively high, though this is a common challenge in 3D medical image analysis.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Machine-learning algorithms for asthma, COPD, and lung cancer risk assessment using circulating microbial extracellular vesicle data and their application to assess dietary effects

A. Mcdowell, J. Kang, et al.

Engineering and Technology

Flow virometry for water-quality assessment: protocol optimization for a model virus and automation of data analysis

H. R. Safford, M. M. Johnson, et al.

Medicine and Health

Cross-cultural translation and modification of the revised oral assessment guide for oral health assessment by non-dentists

N. Limpuangthip, O. Komin, et al.

Education

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

A. Gilson, C. W. Safranek, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny