Introduction
Osteoporotic vertebral compression fractures (OVCFs) are the most prevalent fragility fractures, causing significant morbidity and mortality. The insidious onset of OVCFs leads to underdiagnosis, hindering timely treatment and increasing the risk of further fractures. While medical imaging techniques like X-rays, MRI, and CT are used for diagnosis, each has limitations. X-rays lack sensitivity and specificity; MRI is expensive and time-consuming; and CT, while more accurate than X-rays, can miss occult fractures. The advent of deep learning (DL) offers potential for improving OVCF diagnosis. Previous DL-assisted systems have shown promise but mostly focused on simply identifying the presence or absence of OVCFs, without precise localization. Moreover, these studies did not consider other vertebra diseases that often coexist. This study aimed to develop a DL-assisted diagnostic system capable of single vertebra-level diagnosis, differentiating between OVCFs, old fractures (OFs), Schmorl's nodes (SNs), Kummell's disease (KD), and previous surgery (PS) using CT images. Accurate differentiation is vital for guiding appropriate surgical and conservative management strategies.
Literature Review
Several studies have explored DL-assisted OVCF diagnosis using CT images. Tomita et al. developed a coupled DL system analyzing whole-spine sagittal CT images, achieving sensitivity, specificity, and AUC of 0.85, 0.96, and 0.91, respectively. However, this system did not pinpoint the location of fractures. Kolanu et al. reported a CAD system with specificity and sensitivity of 0.92 and 0.54, respectively, but again lacked precise localization. Other studies have focused on X-ray-based DL models for fracture recognition, but these suffer from limitations due to factors such as image occlusion. This study builds upon previous research by addressing the need for a more comprehensive and precise diagnostic system that accounts for multiple vertebra disease types and provides single vertebra-level diagnosis using CT imaging, a superior modality compared to X-rays for this purpose.
Methodology
This retrospective study used CT images from 1,051 patients with OVCFs from Beijing Luhe Hospital (training and testing datasets) and 46 patients from Xuanwu Hospital (validation dataset). All patients underwent X-ray, CT, and MRI examinations. Three experienced spine surgeons annotated the CT images. For vertebra detection, all vertebrae were annotated with bounding boxes (Bboxes). For classification, only injured vertebrae were annotated, categorizing them into OVCF, OF, SN, KD, and PS. A two-stage DL system was developed: a VDModule for vertebra detection and a VCModule for vertebra classification. The VDModule employed a Faster R-CNN architecture with ResNet18 or MobileNet v2 as the backbone. Data augmentation techniques were used to increase the dataset size. The VCModule used a multi-output DL model based on a pre-trained ResNet50, addressing the issue of co-occurrence of multiple diseases. Random oversampling and undersampling were used to address class imbalance. The performance of both modules was evaluated using metrics such as AUC, precision-recall curves, mAP, sensitivity, specificity, PPV, NPV, and F1-score. Statistical analysis included Pearson Chi-square test, ANOVA, Bartlett's test, and Tukey's multiple comparisons.
Key Findings
The ResNet18-based VDModule achieved excellent performance in vertebra detection, with an AUC of 0.982, FP rate of 1.52%, and FN rate of 1.33% in the testing dataset. The ResNet50-based VCModule showed high accuracy in diagnosing OVCF, OF, KD, and PS. In the Luhe Hospital testing dataset, the average sensitivity and specificity were 0.919 and 0.995 respectively. In the Xuanwu Hospital validation dataset, the average sensitivity and specificity were 0.891 and 0.989, respectively. The model demonstrated good generalization ability. However, the performance in diagnosing SN was relatively poor, likely due to the limited number of SN samples and the similarity of SN features to OVCF and OF. The study highlighted the capability of the two-stage DL system in achieving single vertebra-level diagnosis of multiple vertebra diseases, which is more precise than previous slice-level diagnostic approaches.
Discussion
This study's findings demonstrate the feasibility and effectiveness of using a DL-based system for accurate and rapid diagnosis of multi-type vertebra diseases. The high sensitivity and specificity achieved for OVCF, OF, KD, and PS are noteworthy advancements over existing methods. The system's ability to provide single vertebra-level diagnosis significantly improves diagnostic precision. The good performance on the independent validation dataset from Xuanwu Hospital supports the system's generalizability. The relatively poor performance for SN diagnosis underscores the need for larger datasets for less frequent disease types. The system offers the potential to improve the efficiency and reliability of vertebral fracture diagnosis, particularly in resource-constrained settings or emergency situations. The system could also be integrated into clinical workflows to improve radiologist efficiency.
Conclusion
This study presented a novel deep learning system for diagnosing four types of vertebra diseases (OVCF, OF, KD, and PS) from CT images, achieving high diagnostic accuracy and demonstrating good generalizability. The system's ability to perform single vertebra-level diagnosis represents a significant improvement over previous approaches. Future work should focus on expanding the dataset, particularly for SN, to improve diagnostic performance for this disease type. Further research could also explore incorporating 3D models and patient-level diagnosis capabilities. Clinical trials are needed to fully evaluate the impact of this system on patient outcomes and workflow in real-world settings.
Limitations
The study's limitations include class imbalance in the dataset, particularly for SN and KD. While oversampling and undersampling techniques were employed, this could introduce bias. The system currently performs diagnosis at the vertebra and slice levels rather than at the patient level, limiting its ability to provide a comprehensive assessment of the entire spine. The relatively poor performance on SN diagnosis highlights the need for a larger and potentially higher quality SN dataset. Further validation with larger, multicenter datasets and direct comparison with human expert performance are needed to confirm the system's reliability and clinical utility.
Related Publications
Explore these studies to deepen your understanding of the subject.