logo
ResearchBunny Logo
Introduction
Cardiovascular diseases (CVDs) are the leading cause of death globally, with over 17.9 million deaths annually. While cardiac magnetic resonance imaging (CMR) is the gold standard for assessing cardiac function and diagnosing CVDs, its widespread use is hampered by the time-consuming and expertise-demanding nature of CMR interpretation. The shortage of qualified CMR-trained doctors further limits its accessibility, particularly in low- and middle-income countries. This necessitates the development of automated CMR interpretation systems for efficient and scalable CVD screening and diagnosis. Deep learning, with its ability to learn distinctive features and recognize motion patterns from raw CMR data without extensive manual feature engineering, offers a promising solution. However, a comprehensive evaluation of deep learning's ability to analyze CMR data for a broad range of CVDs has been lacking. Previous applications have focused on single aspects of CMR interpretation (e.g., segmentation or wall thickness measurement) or have demonstrated limited diagnostic capabilities. This study aims to address this gap by developing and validating a deep learning approach for automated CMR interpretation and diagnosis of eleven CVD types, mimicking the clinical workflow of a two-stage paradigm: noninvasive screening using cine MRI followed by diagnosis using cine and late gadolinium enhancement (LGE) MRI.
Literature Review
The existing literature highlights the challenges and opportunities in using AI for CMR interpretation. Several studies demonstrate the potential of deep learning for individual tasks like cardiac segmentation or specific disease detection (e.g., myocardial scarring or aortic valve malformations). However, these studies often lack the comprehensive scope needed for a complete clinical workflow, covering various CVDs. The use of transformer-based models, like the Video Swin Transformer (VST) utilized in this study, has shown promise in computer vision for tasks involving video sequences. This study builds upon existing work by using this advanced model to analyze CMR data comprehensively, aiming for a higher diagnostic performance and broader applicability across diverse CVD types.
Methodology
This study employed a large, nationwide CMR dataset of 9,719 individuals (6,608 male and 3,111 female) from eight medical centers across China. The dataset was divided into a CVD cohort (8,066 patients with 11 types of CVDs) and a normal control cohort (1,653 subjects). The 11 CVD types included hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM), coronary artery disease (CAD), left ventricular noncompaction cardiomyopathy (LVNC), restrictive cardiomyopathy (RCM), cardiac amyloidosis (CAM), hypertensive heart disease (HHD), myocarditis, arrhythmogenic right ventricular cardiomyopathy (ARVC), pulmonary arterial hypertension (PAH), and Ebstein’s anomaly. The data was acquired using three different MRI vendors (GE Healthcare, Philips, and Siemens). The study developed a two-stage AI model using a Video Swin Transformer (VST) architecture: a screening model (using cine MRI from short-axis (SAX) and four-chamber (4CH) views) and a diagnostic model (using cine and SAX LGE MRI). The models were trained using a threefold cross-validation approach within the primary dataset (Beijing Fuwai Hospital) and then validated on an external dataset from the remaining seven medical centers. The performance of the AI models was compared against the interpretations of cardiologists with varying levels of experience. The study also used Grad-CAM to assess model interpretability and Shapley values to assess feature importance (influence of cine vs. LGE). A separate independent consecutive test set from Beijing Fuwai Hospital was used for additional real-world validation.
Key Findings
The screening model achieved an area under the curve (AUC) of 0.986 (95% CI 0.984–0.988) and an F1 score of 0.977 (95% CI 0.974–0.979) in the primary dataset and an AUC of 0.990 (95% CI 0.986-0.992) in the external dataset. The diagnostic model, using both cine and LGE data, achieved a class-weighted average AUC of 0.991 and an F1 score of 0.906 in the primary dataset and an AUC of 0.991 and an F1 score of 0.884 in the external dataset. Notably, the diagnostic model outperformed cardiologists with over 10 years of experience, particularly in diagnosing PAH (F1 score 0.983 vs. 0.931). The VST model significantly outperformed a conventional CNN-LSTM approach. Grad-CAM analysis showed that the AI model identified clinically relevant features for each CVD type. Analysis using Shapley values showed that both cine and LGE modalities were important for the diagnostic model. A consecutive independent testing set further validated the high performance of both the screening (AUC 0.984, F1 0.962) and diagnostic (AUC 0.986, F1 0.903) models. In general, sensitivity and specificity pairs were >90%.
Discussion
This study demonstrates the high accuracy and effectiveness of AI-powered CMR interpretation for CVD screening and diagnosis. The high AUC and F1 scores, comparable or exceeding those of experienced cardiologists, suggest that AI can significantly improve the efficiency and scalability of CMR interpretation. The ability to use cine MRI alone for effective screening offers a less invasive and more cost-effective approach. The superior performance in diagnosing PAH highlights AI's potential to identify subtle features not easily detected by human experts, which can be particularly valuable for conditions like PAH where early diagnosis is crucial for patient outcomes. The study's findings are particularly relevant given the global shortage of CMR experts and the substantial burden of CVD worldwide.
Conclusion
This study showcases the potential of AI-enabled CMR interpretation to significantly improve the efficiency and accuracy of CVD screening and diagnosis. The high-performing two-stage AI model, validated internally and externally, demonstrates a significant advancement in the field. Future research should focus on prospective clinical trials to confirm these findings in real-world clinical settings, explore the model's generalizability across diverse populations, and incorporate additional CMR modalities and clinical information to further enhance diagnostic performance and interpretability.
Limitations
This study has several limitations. The retrospective nature of the data collection and the limited number of controls compared to the overall study population could impact the generalizability of the findings. Further, all participating institutions were located in eastern Asia, potentially limiting the generalizability to other populations. While the models show high performance, clinical validation through prospective studies and clinical trials is essential before widespread implementation. Finally, full model interpretability remains a goal for future work.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny