logo
ResearchBunny Logo
Introduction
Auscultation, the process of listening to breath sounds, is a common clinical tool despite known inconsistencies in interpretation. Inter-observer variability among physicians, influenced by experience and skill, remains a challenge. While digital stethoscopes and spectrograms offer enhanced resolution, human subjectivity remains a significant limitation. The study aimed to leverage AI to improve breath sound identification and compare the performance of physicians and AI models in distinguishing between wheezes and crackles, acknowledging the inherent subjectivity in defining these sounds. The researchers hypothesized that the characteristics of adventitious sounds themselves might contribute to classification difficulties and wanted to examine the robustness of deep learning models against different sound characteristics.
Literature Review
Previous studies have highlighted the reproducibility and reliability of auscultation but also its inter-observer variability, particularly concerning crackles. The limited contemporary role of lung auscultation in favor of superior diagnostic modalities such as ultrasound or radiography has been suggested. However, digital stethoscopes and spectrograms are improving auscultation's accuracy, and the use of machine learning holds promise for more objective breath sound analysis. Prior research suggests that wheezes are easier to identify than crackles due to their distinct acoustic characteristics. The study builds upon this by directly comparing physician and AI performance on a large dataset.
Methodology
This cross-sectional comparative study used breath sounds from the Formosa Archive of Breath Sound recorded at four sites on both lungs of 199 non-trauma patients (aged >20) in a hospital emergency department. Breath sounds were recorded using a digital stethoscope and converted into mel-spectrograms. Five physicians independently labeled each sound into five categories: normal, wheezing, crackles, unknown, and no breath sounds. Six AI models were developed: five emulating individual physicians and one using all data. Discrepancies between physician labels, the all-data AI model, and the majority output from the five AI models were considered doubtful and re-labeled by two additional physicians. The final labels were determined by a majority vote. Sensitivity, specificity, and the area under the receiver-operating characteristic curve (AUROC) were calculated to evaluate the performance of both physicians and AI models.
Key Findings
A total of 11,532 breath sound files were labeled, resulting in 579 doubtful labels. After relabeling and exclusion, 305 labels were considered gold standard. For wheezing, both physicians and the AI model showed high sensitivity (89.5% vs. 86.6%) and specificity (96.4% vs. 95.2%). However, for crackles, sensitivity was high (93.9% vs. 80.3%) but specificity was low (56.6% vs. 65.9%). AUROC values were lower for crackles compared to wheezes. The results showed that while the AI model performed well for wheezing, it did not improve significantly the diagnosis of crackles. The Table 2 provides detailed comparison between human physician and All-data AI model in breath sound identification. Figure 3 shows the ROC curve comparisons. Table 3 discusses meaning and implications of the study.
Discussion
The study's findings highlight the difficulty in accurately identifying crackles compared to wheezes, both for human physicians and AI models. This difference in performance may be attributed to the inherent acoustic characteristics of crackles – their brief, discontinuous, and explosive nature, along with the influence of background breath sounds. The low specificity of crackles, even with AI assistance, emphasizes the need for caution when making medical decisions solely based on this sound. This reinforces the importance of supplementary diagnostic tests for confirming diagnoses. The study's results align with prior research demonstrating the superiority of wheezing identification over crackles.
Conclusion
This study demonstrated that both physicians and AI models struggle to accurately identify crackles compared to wheezes. The low specificity of crackles indicates that medical decisions based on their presence should be approached with caution and confirmed with additional examinations. Future research should focus on improving crackle identification techniques, potentially through enhanced data augmentation, refined AI algorithms, and a standardized definition for crackles.
Limitations
The study's limitations include its single-center design, which may limit generalizability. The specific digital stethoscope and AI algorithms used might also affect the results. The definition of crackles and wheezes, while based on established guidelines, remains inherently subjective, potentially influencing the inter-rater reliability. Further research is needed with diverse patient populations and alternative technologies to validate the findings.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny