Medicine and Health

Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms

M. Gadaleta, J. M. Radin, et al.

This exciting study led by Matteo Gadaleta, Jennifer M. Radin, and their team reveals the promising potential of a machine learning model to accurately detect COVID-19 infections using data from wearable devices. The research demonstrates how scalable and passive monitoring can be achieved even without self-reported symptoms, marking a significant advancement in public health monitoring.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses whether passively collected wearable sensor data, augmented when available by self-reported symptoms, can detect COVID-19 infection. Contextually, frequent identification, tracing, and isolation are essential for controlling SARS-CoV-2 spread, yet widespread diagnostic testing faces implementation and access challenges. Symptom self-reporting can be predictive but requires active engagement, misses asymptomatic cases, and delays pre-symptomatic detection. Commercial wearables passively monitor biometrics such as heart rate, sleep, and activity and have shown utility in COVID-19 detection, but prior studies often focus on a specific device or limited signals. This work aims to develop a device-agnostic, explainable machine learning algorithm that adapts to heterogeneous data availability across sensors and user engagement levels, potentially enabling broader, scalable early detection including in asymptomatic or non-reporting individuals.

Literature Review

Prior research indicates self-reported symptoms can predict COVID-19 positivity and encourage earlier testing. Wearable-derived signals have shown associations with infection, including changes in resting heart rate, sleep, activity, respiratory rate increases, and heart-rate-variability decreases, with some signals differentiating COVID-19 from other influenza-like illnesses. Early studies have demonstrated potential for pre-symptomatic and asymptomatic detection using smartwatch data. However, many prior works were limited to specific brands or predefined signals. These gaps motivate device-agnostic approaches that leverage all available wearable data and can function without self-reported symptoms.

Methodology

Study design and population: Adults (≥18 years) in the United States were eligible for the DETECT study. Participants downloaded the MyDataHelps iOS/Android app, provided electronic informed consent (IRB 20-7531), and could share wearable data (via Fitbit direct API or Apple HealthKit/Google Fit), historical data prior to enrollment, self-reported symptoms, COVID-19 diagnostic test results, vaccination status, and optionally electronic health records. Recruitment occurred via a study website, media, and partners (Walgreens, CVS/Aetna, Fitbit). Data collection period was March 25, 2020 to April 3, 2021. Cohort definitions and outcomes: All participants reporting at least one COVID-19 nasal swab test were considered. A test was labeled Negative if no positive test was reported from 60 days before to 60 days after that test date. A minimum 60-day separation between tests from the same individual was enforced. Participants were grouped as symptomatic (at least one symptom reported from 15 days before up to the test date) or no-symptom-reported (no symptoms reported in that window). Analyses considered two temporal windows per test: (1) pre-test only (5 days before the test date) and (2) pre- and post-test (5 days before to 5 days after the test date) to assess potential behavioral changes around testing. Signals and features: Four categories were used: (1) Sensor features from wearables, subdivided into activity (e.g., steps, distance, activity minutes, sedentary minutes, calories), heart rate (e.g., resting heart rate, max HR, HRV metrics), and sleep (e.g., total sleep time, time in bed, sleep efficiency); (2) Symptom features as binary indicators for each reported symptom (e.g., fatigue, headache, dyspnea, GI symptoms, loss of taste/smell, cough, fever/chills/sweating, congestion/runny nose, neck pain, body aches, sore throat, stomach ache); (3) Anthropometrics (e.g., BMI, height, weight, body fat percentage, basal metabolic rate) when available; (4) Demographics (age, gender). If multiple devices provided the same data type, the most used device in the reporting period was selected. Data validity required at least 50% availability in the baseline period. Baseline and deviation computation: For each daily metric, a dynamic, individualized baseline was computed as a weighted average of past data, excluding the six most recent days to avoid contamination by acute changes. Exponentially decreasing weights emphasized more recent history while down-weighting older days; if a symptom was reported, weights were set to zero from the day of symptom through the next 10 days when computing the baseline. Baseline variability was computed as a weighted variance over a 60-day horizon. Daily deviation (a normalized z-score-like value) was defined as (DailyValue − Baseline) / BaselineVariability. These deviations provided features capturing departures from personal norms. Modeling approach: An explainable gradient boosting model based on decision trees ingested all available features, naturally handling sparse and heterogeneous data due to differing device capabilities and engagement levels. The model adapts to the presence or absence of self-reported symptoms by leveraging available signals. Feature importance was quantified via average prediction changes under feature perturbation and aggregated into macro categories (symptoms, activity, heart rate, sleep, anthropometrics, demographics). Evaluation: The model was trained and evaluated separately for the symptomatic and no-symptom-reported cohorts and under the two temporal windows (pre-test only vs pre- and post-test). Performance was summarized using ROC AUC with 95% CIs, and in figures additional metrics were presented (sensitivity, specificity, PPV, NPV). Statistical comparisons of model outputs between COVID-19-positive and -negative groups used one-sided Mann-Whitney tests. Symptom frequency differences were tested via two-sided Fisher’s exact tests with 95% CIs.

Key Findings

- Enrollment and testing: 38,911 participants enrolled (61% female; 15–19% over 65 depending on section). In the abstract, 1,118 reported a positive test and 7,032 a negative test by PCR; across the study period, 18,175 tests were reported (1,360 positive, 16,938 negative, 471 unknown) with some participants reporting multiple tests. - Discrimination performance: - Symptomatic cohort: AUC 0.83 (95% CI 0.81–0.85) when including 5 days post-test; AUC 0.78 (0.75–0.80) using pre-test data only. Corresponding example operating points reported in figures showed SE ~0.78 and SP ~0.70 (post-test included), with NPV up to ~0.90. - No-symptom-reported cohort: AUC 0.74 (0.72–0.76) including post-test; AUC 0.66 (0.64–0.68) pre-test only, with high NPV (~0.93–0.95) and low PPV consistent with lower prevalence and signal strength. - All-individuals analysis without symptom features (abstract): AUC 0.70 (0.66–0.79) including post-test; AUC 0.70 (0.69–0.72) pre-test only. - Separation of positives vs negatives: Model outputs differed significantly between COVID-19-positive and -negative individuals across cohorts and temporal windows (Mann-Whitney p<0.01), with clearer separation in symptomatic cases. - Feature importance: - In symptomatic cohort (pre-test only), self-reported symptoms contributed ~60% of model importance; including post-test, symptom importance decreased to ~46% as behavioral changes influenced sensor features. - In the no-symptom-reported cohort, activity features gained relative importance (reported increase from ~46% to ~54% in the pre-test setting), while sleep feature importance remained relatively stable across windows. - Heart rate features had small contribution (~6%) in the symptomatic cohort but increased to ~18% in the no-symptom-reported cohort when using pre-test data only. - Anthropometrics contributed modestly; demographics had negligible contribution. - Symptom-level discriminators (symptomatic cohort): Loss of taste/smell, fever, chills, fatigue, headache, and muscle/body aches were among the most discriminative symptoms; symptoms like cough, sweating, and congestion/runny nose had lower and less consistent contributions (≤5% in some analyses).

Discussion

The decision tree-based gradient boosting model effectively integrates heterogeneous wearable-derived signals with demographics and, when available, self-reported symptoms to distinguish COVID-19-positive from -negative cases. The algorithm’s adaptability enables operation when symptom data are absent, addressing nearly half of positive individuals who may not report symptoms. Temporal analyses indicate that including post-test days can improve performance, likely reflecting behavioral changes (e.g., reduced activity) around testing and result receipt; nevertheless, clinically relevant discrimination is retained using only pre-test data, supporting early detection utility. Symptom features dominate when available, underscoring the value of low-burden mechanisms for symptom reporting, while in their absence, activity and heart rate features become more informative. The model’s device-agnostic design supports broad scalability across diverse wearables and data completeness scenarios, enabling application in settings where symptom collection is infeasible and facilitating passive population monitoring. The findings align with and extend prior evidence that wearable-derived physiological deviations from personalized baselines can signal acute infection, including pre-symptomatic periods.

Conclusion

This work demonstrates that an explainable, device-agnostic gradient boosting model can detect COVID-19 infection using passively collected wearable sensor data, with enhanced accuracy when self-reported symptoms are available. In symptomatic individuals, the model achieved AUC up to 0.83 when including post-test days and 0.78 using pre-test data only; in those without symptom reports, it maintained meaningful discrimination (AUC up to 0.74 including post-test; 0.66 pre-test). Feature importance analyses highlight the primacy of symptoms when present and the compensatory roles of activity and heart rate signals otherwise. The platform can scale across devices and user engagement levels, supporting passive monitoring and application in settings lacking symptom data collection. Future work should incorporate richer intra-day and advanced metrics (e.g., respiratory rate, peripheral temperature, HRV), validate pre-symptomatic detection at larger scale, and address equity and access considerations to ensure generalizable, inclusive deployment.

Limitations

- Data heterogeneity and sparsity: Wearable data availability varied by device and user engagement; many participants lacked advanced metrics (respiratory rate, peripheral temperature, detailed HRV), limiting their contribution. - Aggregation level: Analyses primarily used daily aggregates; finer intra-day dynamics may provide additional predictive value not captured here. - Self-report biases: Symptoms and test results are self-reported and subject to recall/reporting bias; timing inaccuracies can affect cohort assignment and temporal windows. - Behavioral confounding: Including post-test data may capture behavior changes due to testing or result awareness, potentially inflating performance metrics relative to purely pre-test detection. - Generalizability and equity: Reliance on wearables may introduce disparities due to unequal access and potential sensor accuracy differences across skin types; cohort demographics and participation patterns may limit generalizability. - Inconsistent availability across devices: Device-agnostic modeling must contend with missing feature sets for less capable devices, which can reduce performance for some users.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Prediction of mortality risk and duration of hospitalization of COVID-19 patients with chronic comorbidities based on machine learning algorithms

P. Amiri, M. Montazeri, et al.

Linguistics and Languages

Stylistic and linguistic variations in compliments: an empirical analysis of children's gender schema development with machine learning algorithms

X. Liao and Y. Zhang

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Medicine and Health

A wearable sensor and machine learning estimate step length in older adults and patients with neurological disorders

A. Zadka, N. Rabin, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny