Introduction
Intracranial aneurysms (IAs) are a significant health concern, affecting 3.2% of the general population and causing a substantial majority of spontaneous subarachnoid hemorrhages (SAHs). Early and accurate diagnosis is crucial for effective management and improved patient outcomes, impacting both clinical decisions and prognosis. Computed tomography angiography (CTA) is the recommended non-invasive imaging technique for IA detection and follow-up, but its interpretation is time-consuming, requires specialized training, and suffers from inter-observer variability and high false-negative rates. Diagnostic accuracy is further affected by factors such as aneurysm size, CT scanner specifications, image acquisition protocols, image quality, and radiologist experience, resulting in sensitivity ranging from 28% to 97.8%. The increasing use of CTA, especially in acute ischemic stroke (AIS) workup, exacerbates the workload on radiologists, highlighting the need for high-performance computer-aided diagnosis (CAD) tools. While previous CAD systems based on conventional image features or deep learning (DL) combined with magnetic resonance angiography (MRA) have shown promise, CTA-based DL models for IA detection have been limited by small sample sizes, lack of external validation, and insufficient real-world testing. Therefore, this study aims to develop and rigorously validate a robust DL model for IA detection on CTA images that is suitable for real-world clinical application, addressing the shortcomings of existing approaches.
Literature Review
Existing computer-aided detection (CAD) systems for intracranial aneurysms (IAs) have primarily relied on conventional methods using pre-defined image features like vessel curvature, thresholding, or region-growing algorithms. However, these approaches often lack robustness and generalization capabilities in real-world clinical settings. Recent advances in deep learning (DL) have demonstrated significant potential in medical image analysis, achieving or even surpassing expert-level diagnostic accuracy in various applications. While DL has shown promise in IA detection using magnetic resonance angiography (MRA), its application to computed tomography angiography (CTA) has been less explored. Previous studies using CTA-based CAD systems for IA detection have been limited by small datasets, lack of independent external validation, and failure to adequately simulate real-world clinical scenarios, hindering their clinical applicability. This study addresses these limitations by employing a large, diverse dataset and rigorous validation across multiple centers and clinical settings.
Methodology
This study developed a 3D convolutional neural network (CNN) segmentation model, DAResUNet, for automated IA detection in digital subtraction bone-removal CTA images. The model utilized an encoder-decoder architecture similar to U-Net, incorporating residual blocks to enhance training stability and a dual attention block to improve feature representation and capture long-range contextual information. The model was trained on a large dataset of 1177 digital subtraction angiography (DSA)-verified bone-removal CTA cases (Internal cohort 1) from Jinling Hospital, split into training, tuning, and testing sets. Rigorous validation was performed across eight cohorts (Table 1) encompassing diverse clinical scenarios: internal validation cohorts (Internal cohorts 2-5) assessed model performance on independent datasets from Jinling Hospital, including evaluation of image quality and a simulated real-world clinical setting for suspected acute ischemic stroke (AIS). External validation cohorts (NBH, TJ, LYG) assessed model generalizability on datasets from three independent hospitals, including the assessment of the impact of different CT manufacturers. The model's performance was compared to that of six board-certified radiologists and two expert neurosurgeons using consecutive real-world cases with suspected IAs (Internal cohort 4 and LYG cohort). Image quality assessment involved a four-point scale based on noise, vessel sharpness, and overall quality. The analysis included patient-level and lesion-level metrics, such as sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and Dice coefficient. Statistical significance was determined using Pearson's chi-squared test, Fisher's exact test, and the Wald method with Agresti-Caffo correction for non-inferiority comparisons.
Key Findings
The DAResUNet model achieved high performance in the primary validation (Internal cohort 1), demonstrating a patient-level sensitivity of 97.3% and a lesion-level sensitivity of 95.6% on the testing set. Internal validation cohorts showed consistent performance, with patient-level sensitivity generally above 70%, except for Internal cohort 3 (image quality study). External validation cohorts demonstrated good generalizability, although performance varied slightly across different CT manufacturers (Siemens performed best, followed by GE, with Toshiba showing lower sensitivity). Comparison with radiologists and neurosurgeons (Internal cohort 4 and LYG cohort) revealed that the model had superior patient-level sensitivity (p=0.037 for radiologists in Internal cohort 4, p=0.022 for radiologists in LYG cohort, and p=0.037 for neurosurgeons in LYG cohort). In the AIS setting (Internal cohort 5), the model demonstrated a high negative predictive value (NPV) of 99.0%, suggesting its potential to reduce radiologist workload by confidently identifying IA-negative cases. Analysis of error modes revealed that misdiagnosed cases were primarily attributable to tiny aneurysms (<3mm), unusual aneurysm shapes, and difficulties in differentiating aneurysms from normal structures, particularly in certain locations. The model's processing time was significantly faster than the radiologists and comparable to neurosurgeons. Different CT scanner manufacturers had a notable impact on model performance.
Discussion
This study demonstrates the successful development and validation of a clinically applicable deep learning model for IA detection in CTA images. The model’s superior sensitivity compared to radiologists and neurosurgeons, along with its high NPV in the AIS setting, suggests a significant potential for improving clinical workflow and reducing radiologist workload. The model's robustness across different image qualities and CT manufacturers demonstrates its potential for widespread applicability. However, the limitations regarding smaller aneurysms and challenges with certain anatomical locations must be addressed in future iterations, such as by incorporating more diverse and higher resolution datasets. The faster processing time of the model compared to radiologists is an added advantage for efficient clinical use. Although the study showed superior sensitivity in many cases, caution must be taken to ensure that the model's results are not misinterpreted or over-relied upon, potentially undermining human expertise. Future prospective multicenter studies are needed to conclusively demonstrate the impact of this model on patient care and clinical outcomes.
Conclusion
This study presents a deep learning model for intracranial aneurysm detection in CTA images that demonstrates improved sensitivity compared to human experts and high confidence in identifying negative cases. The model's robustness and speed suggest potential for clinical integration to improve efficiency and reduce workload. However, further prospective multicenter trials are needed to fully evaluate its clinical impact and address limitations related to small aneurysms and varying CT manufacturers. Future research should focus on refining the model, improving its interpretability, and incorporating it seamlessly into clinical workflows.
Limitations
Several limitations warrant consideration. First, the relatively small sample size in some validation cohorts limits the generalizability of the findings. Second, the exclusion of certain pathologies (AVM/AVF, head trauma) during data curation might affect the model's performance on studies with such features. Third, the study didn't evaluate radiologists' performance when augmented by the model, and further research is needed in this area. Fourth, model performance varied slightly across different CT manufacturers. Finally, the absence of a prospective, multicenter controlled trial limits the definitive assessment of the model's clinical impact.
Related Publications
Explore these studies to deepen your understanding of the subject.