logo
ResearchBunny Logo
A clinically applicable deep-learning model for detecting intracranial aneurysm in computed tomography angiography images

Medicine and Health

A clinically applicable deep-learning model for detecting intracranial aneurysm in computed tomography angiography images

Z. Shi, C. Miao, et al.

A groundbreaking study by Zhao Shi and colleagues demonstrates a deep-learning model that significantly enhances the diagnosis of intracranial aneurysms using computed tomography angiography, surpassing traditional readings by radiologists and expert neurosurgeons. This model, tested across varied imaging conditions, shows a remarkable 99.0% accuracy in predicted-negative cases, potentially reducing clinician workloads and improving patient care.

00:00
00:00
~3 min • Beginner • English
Introduction
Intracranial aneurysms (IAs) are relatively common, potentially fatal, and their early and accurate detection is critical, especially in subarachnoid hemorrhage (SAH) and intracerebral hemorrhage contexts. Computed tomography angiography (CTA) is guideline-recommended for IA detection and follow-up, but interpretation is time-consuming, variable across readers, and challenged by factors such as aneurysm size, scanner technology, image quality, and radiologist experience, resulting in wide sensitivity ranges. With the increasing use of CTA in acute ischemic stroke (AIS) work-ups, radiologist workload has grown, particularly in non-SAH settings where excluding IAs is difficult. Prior computer-aided detection systems and recent deep learning (DL) approaches have shown potential but often lacked large training datasets, DSA verification, external validation, or real-world testing, limiting clinical applicability. This study aims to develop and validate a clinically applicable DL model for automatic IA detection and segmentation on bone-removal CTA, rigorously trained on DSA-verified data and evaluated across multiple real-world cohorts and scenarios.
Literature Review
Conventional CAD for IA detection on MRA/CTA relied on hand-crafted features (e.g., curvature, thresholding, region growing) with limited exploration of real-world generalization. DL has achieved expert-level performance in several medical imaging tasks and has been explored for IA detection primarily on MRA, reporting promising results. CTA-based DL/CAD for IAs has been rarely reported; two recent studies used relatively small datasets, lacked robust external reference standards (e.g., DSA), and did not test across varied clinical scenarios, risking an 'AI chasm' between algorithm development and real-world utility. The current work addresses these gaps by using large-scale, DSA-verified CTA for training, and validating across internal and external cohorts, varying image quality, scanner manufacturers, and simulated clinical workflows.
Methodology
Study design: Retrospective, multicohort diagnostic study using bone-removal head CTA from four Chinese hospitals, with DSA as the reference standard where available. Eight cohorts were assembled for training, tuning, and multi-scenario validation, including simulated real-world settings and AIS triage. Data and cohorts: Internal cohort 1 (Jinling Hospital; 06/2009–03/2017) included 1,177 CTA cases (869 IA patients with 1,099 aneurysms; 308 controls) all DSA-verified within 30 days, split into training (n=927), tuning (n=100; 50 IA/50 control), and testing (n=150; 50% IA). Inclusion/exclusion ensured high-quality, DSA-verified labels; cases with prior interventions, AVM/AVF, major vasculopathies, poor quality, or DSA-positive/CTA-occult lesions were excluded for training. Additional validation cohorts: Internal cohort 2 (DSA-verified; 04–12/2017; n=245); Internal cohort 3 (DSA-verified; 01/2018–05/2019; n=226) for image quality analysis; Internal cohort 4 (06–08/2019; n=374) and LYG cohort (08/2018–09/2019; n=316) for simulated real-world validation and human-model comparison; Internal cohort 5 (AIS work-up; 2019; n=333) for triage evaluation; NBH cohort (DSA-verified; 01–07/2019; n=211) for external validation; TJ cohort (2013–2018; n=147) for manufacturer effects (GE, Siemens, Toshiba). Bone-removal CTA (Neuro DSA application, Syngo 2008G; Siemens) DICOM images were used for annotation and modeling. Reference standards and annotation: For DSA-verified patients, three neuroradiologists localized IAs on CTA with DSA reference to establish ground truth, followed by pixel-wise manual segmentation (Mimics v16). For non-DSA cohorts (Internal 4, LYG), two neuroradiologists established a silver standard using all available imaging and clinical data, with senior adjudication for consensus. Model: DAResUNet, a 3D CNN segmentation network with an encoder–decoder architecture akin to 3D U-Net, using residual blocks for deeper stable training, dilated convolutions in the encoder top layer to enlarge receptive field, and a dual attention module to capture long-range context. Input: 80×80×80 3D patches. Training: random patch sampling with 50% probability to include aneurysms; augmentations (rotation, scaling, flipping). Intensity preprocessing: clipping to [0,900] HU and normalization to [-1,1]; adaptive windowing also considered [0,450] and [-50,650] based on vessel-region histogram analysis. Loss: weighted sum of binary cross-entropy and Dice loss. Optimization: Adam (momentum 0.9, weight decay 1e-4), poly learning rate schedule (initial lr=1e-4), 100 epochs; per epoch ~60,000 patches (600 patients × 100 patches). Inference: uniform-stride patch sampling with 1/8 overlap (stride 40), voxel-wise max probability fusion to whole-volume prediction; vessel-region detection used thresholding and connectivity to guide adaptive window selection. Evaluation: Segmentation metrics included lesion-level sensitivity, Dice ratio, and false positives per case (FPs/case). Detection metrics included accuracy, patient-level sensitivity and specificity, PPV, NPV with 95% Wilson CIs. Comparative analyses: image quality (4-point scale), manufacturer effects (Bonferroni correction), and human-model comparisons with six radiologists (residents, attendings, assistant directors) and two neurosurgeons, blinded to clinical data, using standard clinical workstations and 3D tools; micro-averaged clinician metrics were computed. Statistical tests included Pearson’s chi-squared or Fisher’s exact tests; superiority/non-inferiority (5% absolute margin) assessed via Wald method with Agresti-Caffo correction; significance at p<0.05. Computational performance: Mean processing time ~17.6–19.6 s per exam depending on cohort and setup. Code availability: https://github.com/deepwise-code/DLIA.
Key Findings
• Development cohort (Internal 1, test set): With threshold set for ~0.29 FPs/case, the model achieved high patient-level sensitivity (97.3%) with moderate specificity (74.7%); lesion-level sensitivity 95.6%, Dice 0.75; mean processing time 17.6 s per exam. Misses were all small (<5 mm), three tiny (<3 mm); lesion-level sensitivity was 100% for aneurysms ≥5 mm and 98.6% for ≥3 mm. • Internal validation (Internal 2; n=245; 145 aneurysms): Accuracy 88.6%, patient-level sensitivity 94.4%, specificity 83.9%, lesion-level sensitivity 84.1%, FPs/case 0.26. High lesion-level sensitivity by location: ACoA 100%, ACA 100%, VBA 100%, PCoA 87.9%, MCA 87.5%; lower for ICA 60.6%, PCA 66.7%, CA 66.7%. By size: <3 mm 51.7%; ≥3 mm 75.0%; ≥5 mm 95.8%; ≥10 mm 100%. • External validation (NBH; n=211; 46 aneurysms): Accuracy 81.0%, patient-level sensitivity 84.6%, specificity 80.2%, lesion-level sensitivity 76.1%, FPs/case 0.27. Location-wise lesion sensitivity: 100% for MCA, ACoA, ACA, PCA; 80.0% PCoA; 66.7% ICA; 62.5% VBA. Size-wise: <3 mm 37.5%; ≥3 mm 84.2%; ≥5 mm 90.5%; ≥10 mm 100%. No significant differences between internal and external validation for patient-level sensitivity, lesion-level sensitivity, or specificity (p=0.114, 0.239, 0.400). • Occult aneurysms: Among 31 CTA-negative/DSA-positive cases (39 aneurysms; mean size ~2.0 mm), the model detected 5 occult aneurysms (mean 2.7 mm) across ICA, CA, and ACA locations. • Image quality tolerance (Internal 3): Across scores 1–4, patient-level sensitivities were 66.7%, 93.3%, 75.0%, 72.7%; lesion-level sensitivities 66.7%, 84.2%, 62.8%, 66.7%; specificities 87.5%, 90.0%, 83.1%, 92.3%; no significant differences among subgroups (Bonferroni-corrected p>0.05), indicating robustness to image quality. • Manufacturer effects (TJ cohort): Lesion-level sensitivity significantly higher for Siemens (89.3%) than GE (62.5%, p=0.001) and Toshiba (32.0%, p<0.001). Patient-level sensitivities: Siemens 90.5%, GE 69.2%, Toshiba 40.0%. Specificity comparable at 100% for GE and Siemens; 58.6% for Toshiba (differences not significant after correction). • Human-model comparison (Internal 4 and LYG): The model had higher patient-level sensitivity than radiologists in Internal 4 (p=0.037) and higher than both radiologists (p=0.022) and neurosurgeons (p=0.037) in LYG; lesion-level sensitivity was comparable. However, the model’s specificity, accuracy, and PPV were significantly lower than clinicians (p<0.001). Reading time: model ~18.2 s (Internal 4) and 19.6 s (LYG) per exam, significantly faster than radiologists and comparable to neurosurgeons. • AIS triage (Internal 5; n=333; 16 aneurysms): Specificity 89.7%, patient-level sensitivity 78.6%, NPV 99.0%. With model triage, 86.8% of patients were predicted negative, 99.0% of which were true negatives; 13.2% flagged as high risk for focused review. Five aneurysms in three patients were missed (three <4 mm; locations: MCA=3, ICA=2).
Discussion
The study demonstrates that a tailored 3D DL segmentation model (DAResUNet) trained on large, DSA-verified, bone-removal CTA can detect and localize intracranial aneurysms with high lesion-level sensitivity and improved patient-level sensitivity relative to clinicians in simulated real-world settings. Performance generalizes across internal and external cohorts and is relatively tolerant to variations in image quality, though scanner manufacturer and protocol differences can impact sensitivity. The model particularly excels for aneurysms ≥5 mm and common locations, while tiny lesions (<3–4 mm) and certain territories (e.g., ICA, VBA, CA) remain more challenging. In AIS workflows, the model’s very high NPV enables reliable exclusion of IA-negative cases to prioritize radiologist attention and potentially reduce workload. Despite lower specificity and PPV than human readers, the model’s segmentation outputs provide interpretable visual cues that can assist clinicians, especially in difficult or ambiguous cases. These findings address the initial clinical need to improve IA detection efficiency and consistency, supporting the model’s role as a complementary tool within routine CTA interpretation.
Conclusion
A clinically applicable DL model for IA detection and segmentation on bone-removal CTA was developed and validated across multiple cohorts and real-world scenarios. It achieves high lesion-level sensitivity, improved patient-level sensitivity versus clinicians, rapid processing, and robust performance across image quality levels, with some sensitivity variation across scanner manufacturers. The model shows promise for triaging AIS CTA studies by confidently excluding IA-negative cases, potentially reducing radiologist workload. Future work should include prospective, multicenter controlled studies; expansion of training data to encompass diverse scanners, protocols, and pathologies; evaluation of radiologist performance augmented by the model; optimization to improve specificity and detection of tiny aneurysms; and assessment of clinical outcomes and cost-effectiveness for routine workflow integration.
Limitations
• Validation cohorts had relatively few positive cases due to low IA prevalence in some clinical settings, limiting precision of estimates. • Training data excluded cases with AVM/AVF and head trauma; performance on such presentations remains untested and may be lower. • Radiologist performance augmented by the model was not evaluated; device approval and integration workflows remain to be studied. • Patient-level sensitivity varied across cohorts, and performance was affected by scanner manufacturer and acquisition protocols, indicating a need for more diverse training data. • No prospective multicenter controlled study was conducted; the current study is retrospective and simulated real-world conditions.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny