logo
ResearchBunny Logo
Enhancing Medical Diagnosis with AI: A Focus on Respiratory Disease Detection

Computer Science

Enhancing Medical Diagnosis with AI: A Focus on Respiratory Disease Detection

S. Sharma, S. Pandey, et al.

Discover a revolutionary non-invasive software method for detecting respiratory diseases, developed by Sachin Sharma, Siddhant Pandey, and Dharmesh Shah. Using audio recordings from an electronic stethoscope and convolutional neural networks, this research offers a user-friendly web application that delivers accurate diagnoses, achieving 90% accuracy in classification.

00:00
00:00
~3 min • Beginner • English
Introduction
Respiratory diseases are among the leading causes of death worldwide, with chronic obstructive pulmonary diseases (COPDs) notably prevalent in post‑pandemic statistics. This work aims to develop a noninvasive software method to detect respiratory diseases using internal respiratory audio analyzed by convolutional neural networks (CNNs) trained on an open‑source Respiratory Sound Database (Kaggle). Traditional diagnostic techniques (spirometry, chest X‑ray, CT scan, arterial blood gas analysis) have limitations such as unsuitability for certain cardiac patients, side effects like breathlessness and dizziness, and exposure to harmful radiation. Given these drawbacks, AI‑based analysis of images or internal respiratory audio can assist timely COPD detection. The paper outlines characteristic respiratory sound patterns indicative of disorders, related work, the proposed methodology and CNN architecture, experimental setup and observations, Streamlit‑based deployment, and results and conclusions.
Literature Review
Prior work includes: (1) Breath analysis devices for COPD detection; (2) A multilayer perceptron (ANN/MLP) predicting respiratory audio events in asthma and COPD with overall machine performance of 81.0% and peak/non‑peak precision/recall around 77–84%; (3) A real‑time integrated platform using hybrid ML classifiers (SVM, random forest, predicate‑based) achieving 94% correct classification for COPD series; (4) Telemedicine/self‑screening approaches employing a deep CNN (six conv, three max‑pool, three fully connected layers) using log‑scaled Mel‑spectral inputs, benchmarked against five physicians; (5) A large study using a digital stethoscope with 17,930 lung sounds on 1,630 subjects, comparing CNNs and SVMs across classification tasks (e.g., 86% CNN vs 86% SVM on healthy vs unhealthy, 80% CNN vs 80% SVM on singular respiratory sound, etc.). Additional works report CNN/MFCC pipelines for respiratory sound classification and patient‑specific tuning for wearable devices. These studies collectively motivate deep learning on time‑frequency features (e.g., MFCCs) for accurate respiratory sound analysis.
Methodology
Data: Open‑source Respiratory Sound Database (Kaggle), comprising 920 annotated recordings (10–90 s) from 126 patients using an electronic stethoscope. Total 6,898 respiratory cycles (~5.5 hours) including 1,864 crackles, 886 wheezes, and 506 with both. Data span children, adults, and elderly; includes both clean and noisy recordings. Dataset artifacts: 920 .wav files, 920 corresponding .txt annotations, diagnosis listings, naming format description, 91 names, and detailed demographics. Preprocessing and feature extraction: Audio converted to mono for standardization, efficiency, consistency, noise reduction, and model compatibility. MFCCs computed (Librosa), following standard steps: frame the signal; compute power spectrum per frame; apply Mel filter bank and sum energies; take log energies; compute DCT of log filter‑bank outputs. Resulting features (e.g., 40 MFCCs over frames) are concatenated with diagnosis labels to form the feature‑target table. Data augmentation: To expand and diversify training data to 1,428 audio files, applied pitch shifting, time stretching, additive noise, time and frequency masking, dynamic range compression, and resampling. Model architecture: Sequential CNN for nonlinear pattern recognition on time‑frequency input. Input shape (40, 862, 1): 40 MFCCs, 862 frames (with padding), mono channel. Four Conv2D layers with filter counts 16, 32, 64, 128 and kernel size 2×2, each followed by MaxPooling2D; ReLU activations; 20% dropout on convolutional layers. Final pooling via GlobalAveragePooling2D feeding a dense softmax output with six nodes (num_labels). Softmax outputs class probabilities. Training setup: Python implementation trained on Google Colaboratory (NVIDIA K80 GPU; ~12 GB GPU RAM, 13 GB system RAM). Train/test split 80:20. Loss: categorical cross‑entropy. Optimizer: Adam. Epochs: 100. Learning rate: 0.01 with decay of 0.1 every 8 epochs. Deployment: A Streamlit‑based web application was developed. The app includes an audio file uploader, automatically runs inference upon upload, and displays predicted class along with the waveform visualization for interpretability. Ethics approval to carry out the study was obtained from a pulmonary and critical care physician in Ahmedabad, India.
Key Findings
- The four‑layer CNN trained on MFCC features from mono audio achieved an overall accuracy of 90% on the test set. - Classification metrics (precision, recall, F1, accuracy) were computed; ROC curves were provided for each class (figures referenced). - Performance notably surpasses traditional Hidden Markov Model approaches for respiratory sound classification (e.g., best official ICBHI score reported at 39.56) and is significantly higher than classical SVM‑based baselines mentioned in related literature.
Discussion
The study demonstrates that AI‑based analysis of respiratory sounds can effectively detect respiratory diseases, addressing the limitations of traditional diagnostic methods that are invasive or involve radiation exposure. By leveraging MFCC features and a compact CNN architecture with data augmentation, the model captures nonlinear patterns in lung sounds and attains high accuracy (90%). The Streamlit deployment translates the model into an accessible tool that can assist clinicians with early and accurate diagnosis, potentially reducing workload and improving decision‑making. Visualizing the audio waveform alongside predictions enhances interpretability for diverse users, including clinicians and patients.
Conclusion
A CNN‑based system using MFCC features and data augmentation on an open respiratory sound dataset achieved 90% accuracy for multi‑class respiratory disease detection. The approach is realized in a user‑friendly Streamlit web application that provides rapid, noninvasive support for early and accurate diagnosis, potentially easing clinician workload and improving patient care.
Limitations
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny