logo
ResearchBunny Logo
Fast and noninvasive electronic nose for sniffing out COVID-19 based on exhaled breath-print recognition

Medicine and Health

Fast and noninvasive electronic nose for sniffing out COVID-19 based on exhaled breath-print recognition

D. K. Nurputra, A. Kusumaatmadja, et al.

Discover the revolutionary GeNose C19, a low-cost, portable electronic nose developed by a team of researchers including Dian Kesumapramudya Nurputra and Ahmad Kusumaatmadja, enabling rapid, noninvasive COVID-19 detection through exhaled breath. With impressive accuracy, sensitivity, and specificity, this device promises to be a game-changer in fast COVID-19 screening.... show more
Introduction

SARS-CoV-2 has caused a global pandemic with substantial morbidity and mortality. While RT-qPCR is the accepted diagnostic standard, its use as a mass screening tool is limited by resource intensity, invasiveness of sampling, and cost—constraints that are especially acute in low- and middle-income settings. This study addresses whether a portable electronic nose analyzing exhaled breath volatile organic compound (VOC) patterns, combined with machine learning, can rapidly and noninvasively differentiate RT-qPCR-confirmed COVID-19 positive from negative individuals. The purpose is to develop, implement, and evaluate GeNose C19 as a fast, low-cost screening tool and assess its diagnostic performance against RT-qPCR.

Literature Review

Electronic noses using arrays of chemoresistive metal oxide semiconductor (MOS) sensors have shown promise for VOC-based diagnostics across respiratory diseases. Prior work has suggested breath VOC signatures can differentiate COVID-19 patients, though identified biomarkers vary across studies and geographies. Mass spectrometry-based breathomics in the UK, Germany, France, China, and the USA reported multiple candidate VOCs (e.g., aldehydes, ketones like acetone and butanone, esters such as ethyl butyrate, hydrocarbons, and alcohols), highlighting heterogeneity due to methods, populations, and environments. Electronic nose approaches focus on pattern recognition of complex VOC mixtures rather than quantifying specific analytes, enabling portable, rapid screening. Previous sensor systems (including MQ-series MOS sensors) and portable metabolic analysis units have been explored for breath analysis. However, factors such as sampling methods (end-tidal vs mixed expiratory), ambient humidity/temperature, and confounders (diet, smoking, comorbidities) affect VOC profiles and e-nose performance, necessitating standardized protocols and robust algorithms.

Methodology

Study design and ethics: Open-label case-control prospective cohort study approved by the Medical and Health Research Ethics Committee, Universitas Gadjah Mada/Dr. Sardjito General Hospital (KE/189/08/2020); registered at ClinicalTrials.gov (NCT05483712). Subjects were recruited consecutively with informed consent. Participants and samples: 83 subjects (43 RT-qPCR-positive, 40 RT-qPCR-negative) admitted to two hospitals in the Special Region of Yogyakarta, Indonesia (Bhayangkara General Hospital, Sleman District; Bantul District COVID-19 Special Field Hospital). Two positive subjects were excluded due to clinical deterioration. Breath samples were collected daily during hospitalization; invalid samples were excluded, yielding 615 valid samples: 333 positive and 282 negative. Reference standard: RT-qPCR on nasopharyngeal/oropharyngeal swabs (targeting RdRp and E genes; LightCycler 480, Roche) performed at certified national laboratories following WHO/CDC protocols. Device (GeNose C19): Portable e-nose comprising two main units: (1) sensing unit with a sealed chamber containing an array of 10 different MOS chemoresistive gas sensors (S1–S10) with internal heaters (total power ~6 W), a micropump (flow 1 ± 0.2 L/min), three-way solenoid valve (delay, sampling, purging), environmental sensors for temperature and humidity, power module, and data acquisition (microcontroller plus 16-channel external ADC; Bluetooth/USB data transfer); (2) breath sampling unit with a HEPA filter (water absorber element) and disposable 1 L medical-grade PVC sampling bag connected via 4 mm OD medical-grade PTFE tubing. HEPA filtration was verified by RT-qPCR of intake tubing post-measurement (negative), indicating effective virus trapping. Operation cycles: Delay phase 10 s (ambient air baseline), sampling phase 40 s (breath VOC exposure to sensor array to near-saturation), purging phase 120 s (ambient air to clear VOCs). Exhaust vented to room. Preconditioning and environment control: Device preheated ≥30 min; chamber flushed for ~30 min with ambient air before use. Operated only when chamber humidity and temperature within 30–50% RH and 26–42°C. Placement minimized interfering odors; baseline drift without sample limited (<400 mV). Subjects refrained from eating or drinking anything but water for at least 1 hour before breath collection. Breath sampling protocol: End-tidal breath collection into 1 L bags by trained nurses using a non-rebreathing mask. Patients took two initial breaths; the third end-expiratory breath was collected to minimize dead space and oral contamination. Bags were promptly sealed, then connected to the HEPA filter and device inlet for measurement. Signal processing and features: Sensor signals were standardized to their baselines. Time-domain features extracted per sensor: maximum, median, standard deviation, and variance, forming feature vectors from the 10-sensor array. Machine learning: Four supervised classifiers were evaluated: LDA (baseline, parameter-free), SVM (hyperparameters tuned via grid search; TPOT genetic algorithm used to optimize ML pipelines), stacked multilayer perceptron (MLP), and deep neural network (DNN). DNN architecture/hyperparameters: input features from 10 sensors × 4 features; two hidden layers with 500 and 250 neurons (ReLU activations), dropout 0.1 after each hidden layer, sigmoid output, binary cross-entropy loss, Adam optimizer, batch size 5, up to 500 epochs, validation split 0.2, early stopping (patience 5). Data split: 70% training, 30% testing; repeated 10-fold cross-validation for internal validation and overfitting control. Performance metrics: sensitivity, specificity, accuracy, ROC-AUC; confusion matrix reported for DNN on full dataset. Additional subject-level analysis performed using one sample per subject (n=83). Additional characterizations: SEM imaging of HEPA filter fibers; limited GC–MS profiling (Thermo Fisher QG 7000) of select breath samples to explore VOC composition; sensor cross-sensitivity to humidity and temperature assessed (supplementary).

Key Findings
  • Dataset: 615 valid breath samples (333 RT-qPCR-positive; 282 RT-qPCR-negative) from 83 patients (43 positive; 40 negative). Most positives were asymptomatic (≈79–80%).
  • Overall ML performance range: Using LDA, SVM, MLP, and DNN on testing datasets yielded accuracy 88–96%, sensitivity 86–94%, specificity 88–95%.
  • DNN performance (all data; confusion matrix): TP=318, FP=12, FN=15, TN=270; Total=615.
    • Sensitivity: 95.5% (95% CI: 92.7–97.3%).
    • Specificity: 95.7% (95% CI: 92.7–97.5%).
    • Accuracy: 95.6% (95% CI: 93.7–97.1%).
    • ROC-AUC: Training 98.76%, Testing 96.87%.
  • Subject-level analysis (n=83, one sample each) also showed strong performance with DNN (AUC training 99.9%, testing 96.9%).
  • Safety and contamination control: Post-use RT-qPCR of intake tubing was negative, supporting effective HEPA filtration.
  • Sensor behavior: Distinct signal patterns across the 10 MOS sensors for positive vs negative breaths; sensitivity to acetone validated in a controlled setup (S2, S8, S9 responsive; S3, S7 less responsive).
Discussion

GeNose C19, integrating a 10-sensor MOS array with supervised machine learning, effectively discriminated exhaled breath VOC patterns of RT-qPCR-confirmed COVID-19 positives vs negatives, addressing the need for rapid, noninvasive, and low-cost screening. The DNN model provided the most stable and accurate performance with minimal train-test performance gaps and high ROC-AUC, indicating good generalization within the study setting. Compared to existing screening modalities (antigen tests, rapid molecular assays, chest CT), GeNose C19 offers noninvasiveness, speed (~3 minutes from sampling to decision), and ease of use, making it suitable for mass screening and triage. Confounding factors (diet, smoking, comorbidities, ambient VOCs, humidity/temperature) and sampling protocols can influence breath prints; the study minimized these through standardized end-tidal sampling, preconditioning, and environmental controls. The device’s safety measures (HEPA filtration) and cross-sensitivity management (environmental monitoring, operating range constraints) support clinical feasibility. While mass spectrometry studies seek specific biomarkers and report heterogeneous VOC signatures across settings, the pattern-recognition approach circumvents the need for universal biomarkers, focusing instead on robust classification of complex VOC mixtures. Nonetheless, identifying stable biomarkers could further enhance sensor selectivity and model interpretability.

Conclusion

This study demonstrates proof-of-concept that a portable electronic nose (GeNose C19) combined with machine learning can rapidly and noninvasively screen for COVID-19 by recognizing breath-print patterns, achieving high sensitivity, specificity, and accuracy against RT-qPCR. Contributions include: (1) development of an integrated breath sampling and sensing system with safety filtration, (2) standardized end-tidal breath collection protocol, (3) robust feature extraction and ML pipeline with DNN achieving top performance, and (4) validation on 615 breath samples from 83 patients in clinical settings. Future work should include large-scale, double-blind, cross-sectional diagnostic studies with subject-level independence, exploration and validation of specific VOC biomarkers (potentially enabling molecularly imprinted selective sensors), calibration/drift correction across devices and sites, assessment of performance across variants and populations, and correlation with RT-qPCR Ct values and viral load to refine clinical utility and thresholds.

Limitations
  • Study design was open-label case-control; operators knew RT-qPCR status, introducing potential bias.
  • Non-independence of samples: multiple breath samples per subject (615 samples from 83 patients) may inflate performance metrics; subject-level validation was limited.
  • Sample size and cohort diversity were modest; most positives were asymptomatic and few comorbidities, limiting generalizability and analysis of confounders.
  • GC–MS characterization was performed on a small subset; specific VOC biomarkers were not conclusively identified, and findings varied across external studies.
  • Environmental influences (humidity, temperature, ambient VOCs) and sensor drift may affect readings despite controls; long-term stability and inter-device calibration need further evaluation.
  • No direct correlation established between breath VOC patterns and viral load (Ct values); temporal dynamics of VOCs during disease progression require more data.
  • Potential overfitting risks for some alternative ML models (e.g., gradient boosting, decision trees) were noted; external validation across sites is pending.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny