Health and Fitness

Feasibility of continuous fever monitoring using wearable devices

B. L. Smarr, K. Aschbacher, et al.

Discover groundbreaking research by Benjamin L. Smarr and colleagues that explores the potential of wearable peripheral temperature sensors for continuous fever monitoring in COVID-19 patients. The study reveals how these innovative devices can track illness-associated temperature changes, paving the way for enhanced public health monitoring.

00:00

~3 min • Beginner • English

Index

Introduction

The study investigates whether continuous peripheral (finger) temperature measured by wearable devices can feasibly detect and predict fever associated with illness, particularly COVID-19. Traditional single time-point thermometry lacks sensitivity, especially at illness onset, and fails to account for intra- and inter-individual variability due to circadian rhythms, menstrual cycle phases, and other biological rhythms. The authors hypothesize that continuous, individualized temperature time series from wearables, coupled with contextual data, can overcome limitations of single-point measures to detect fever and potentially predict illness onset, supporting broader public health monitoring during pandemics.

Literature Review

Existing approaches using single-point temperature checks (e.g., at workplaces or travel points) have limited sensitivity for early disease detection and may contribute to missed cases. Prior skepticism about distal skin temperature’s utility for fever detection was based on single-point comparisons to core measures. Biological rhythms (circadian, ultradian, menstrual) substantially modulate body temperature, necessitating individualized baselines. Some consumer wearables lack temperature sensors, despite evidence that distal temperature reflects physiologically meaningful rhythms. Prior work has shown improved fever detection with digital thermometers and that HR/HRV relate to autonomic tone, but integrating multiple physiological signals with continuous temperature data for fever detection in real-world populations had not been demonstrated before this study.

Methodology

Design: Feasibility analysis within the TemPredict study using continuous wearable data linked to self-reported COVID-19 symptom timing. Participants: Adults (≥18) owning an Oura ring who consented via UCSF IRB-approved procedures. Initial 110 respondents who reported COVID-19-like symptoms prior to study enrollment were screened; exclusions for missing/insufficient data windows, inability to locate data, pre-2020 symptom dates, or pseudoreplication yielded 50 analyzable cases. Demographics included majority residing in the US (66%), 66% male, mean age 43.7 years (SD 11.0), varied education and race/ethnicity. Six participants worked in patient-facing healthcare roles. Physiological measures: Oura ring sensors recorded: - Finger skin temperature via NTC thermistor (non-calibrated, 0.07 °C resolution) at 1-min intervals (palm side of finger base). - PPG-derived metrics: respiration rate (RR) at 30 s resolution; heart rate (HR) as 5-min mean IBI; heart rate variability (HRV) as RMSSD per 5-min IBI. Raw PPG not stored. Self-report measures: Intake demographics and retrospective COVID-19 illness information (date of symptom onset, recovery date, symptom list). Daily symptom survey via the Oura app. Analyses isolated the symptom “fever” and defined individual “symptom windows” from reported onset to recovery. Data preparation and alignment: Data for HR, HRV, temperature (T), and RR were aligned by day of reported symptom onset, spanning 45 days prior through 20 days after (total 65 days). Linear interpolation standardized all variables to 1-min resolution. Baseline defined as days −40 to −1; symptom window as individualized duration (population mean 9.3 days). For individual overlap and variability, T means and SDs were computed for baseline and symptom windows. Feature extraction and normalization: To capture daily extrema robustly, representative daily minimum and maximum values were computed as the median of the lowest and highest 360 minutes within each 24 h period, reducing sensitivity to outliers. Variables were z-scored and normalized so that each individual’s baseline daily min and max ranged from −1 to +1 (as described). Digital biomarker thresholds (feasibility-focused, not optimized): fever-like day if daily max T > 1.2 or daily min T > −0.2 (normalized units). Individuals were first grouped by self-reported fever (yes/no), then re-sorted by presence/absence of temperature-derived fever-like days within the symptom window. Statistical analyses: Differences between baseline and symptom periods were assessed using nonparametric tests (Wilcoxon rank-sum, Kruskal–Wallis with Tukey–Kramer post hoc; Bonferroni-corrected where applicable). Correlations used Pearson’s r on daily min/max simplified datasets. Multivariate clustering used only timepoints with original (non-interpolated) observations for all variables (primarily nocturnal due to device sampling). Wavelet analysis: Continuous wavelet transform (Morse wavelet, b=5, g=3) extracted approximately circadian power (ACP) in the 22–26 h band. Edge artifacts were mitigated by excluding events within two periods of data edges. ACP peaks were identified and related to subsequent daytime fever-like events; events with missing/artifactual data were removed after visual QC. The relationship between ACP peak height and days to fever-like onset was quantified. Ethics and data availability: UCSF IRB-approved. Data and Matlab code available via UCSD Research Data Library (DOI: 10.6075/J0ZW1JFX).

Key Findings

- Inter-individual variance: Considerable variability in mean finger skin temperature across individuals (population mean ± SD: 31.2 ± 1.7 °C). Mean T increased during symptom windows vs baseline by +0.63 ± 1.0 °C (Wilcoxon rank-sum, p=0.024), supporting feasibility of fever detection with distal temperature while indicating a universal single cutoff (e.g., 38 °C) is inappropriate. - Time-series changes: Daily rhythms dominated intra-individual T variance. Participants reporting fever (n=38) showed elevated nightly maxima near symptom onset; those not reporting fever (n=12) did not show this rise in group averages. - Digital biomarker thresholds: Using normalized daily max > 1.2 or daily min > −0.2 to flag “fever-like” days, 3/38 self-reported fever cases had no fever-like days within their symptom window, while 7/12 without reported fever had fever-like days within the window. - Physiological corroboration: Sorting by temperature-derived fever-like days (detected vs not) yielded stronger and significant changes from baseline to the first week of symptoms across HR, HRV, and RR: • HR: 0.13 ± 0.28 to 1.45 ± 0.25 (p=0.02) • HRV: 0.26 ± 0.13 to 0.48 ± 0.14 (p=0.03) • RR: 0.06 ± 0.08 to 0.24 ± 0.07 (p=0.01) In contrast, sorting by self-reported fever showed significant change only in RR (−0.12 ± 0.10 to 0.30 ± 0.06; p=0.002), with HR and HRV differences not significant (p=0.13, p=0.33). - Pre-symptom detection: 38/50 subjects exhibited fever-like days in the 45 days prior to reported symptom onset, often with coordinated changes across T, HR, HRV (decrease), and RR (increase), suggesting potential for early detection. - Circadian power as predictor: Wavelet-derived ACP peaks (22–26 h band) preceded 226/244 (93%) daytime fever-like episodes by 1–7 days (mean lead 3 days). ACP peak height correlated positively with days until fever-like event (r=0.36, p=1×10^-7).

Discussion

The findings provide proof-of-concept that continuous distal temperature from consumer wearables can detect fever-associated physiological changes and support early identification of illness. Continuous individualized baselines overcome limitations of single-point thermometry by accounting for circadian and other biological variability. Temperature-derived digital biomarkers improved stratification of physiological disruptions (HR, HRV, RR) relative to self-reported fever, implying that self-reports may miss or misclassify physiologic illness and that some events labeled as asymptomatic may be unreported symptomatic episodes. Wavelet-derived disruption of circadian power preceding fever-like episodes demonstrates the potential to predict impending illness. Nonetheless, correlations between temperature and other metrics were modest and individual response trajectories heterogeneous, underscoring the need for multi-signal models and personalized baselines. Integrating temperature with PPG-derived HR/HRV/RR could yield superior illness detection. Broad public health applications will require larger, diverse datasets and careful handling of biases, data quality, and user adherence to ensure generalizability across heterogeneous populations.

Conclusion

This work demonstrates feasibility of continuous fever monitoring using wearable temperature sensors, showing that illness-associated temperature elevations and pre-symptomatic anomalies can be detected and that signal-processing features (circadian power disruptions) may enable prediction days in advance. The study supports integrating temperature sensors into consumer wearables and combining multiple physiological signals for robust illness detection. Future research should: develop and validate multi-metric digital biomarkers with larger, demographically diverse cohorts; include clinical ground truth (e.g., testing/serology); refine individualized baselines and thresholds; leverage advanced signal processing and machine learning while addressing fairness and bias; and establish best practices for participatory, privacy-conscious, large-scale wearable-based public health monitoring.

Limitations

- Small feasibility cohort (N=50) limits generalizability; thresholds were not optimized and serve only as proofs of concept. - Reliance on retrospective self-reported symptom timing and fever; no serological or clinical confirmation to establish ground truth. - Substantial heterogeneity in individual physiological trajectories; modest correlations between variables. - Data quality and adherence issues in the larger pool reduced analyzable cases; missing data and device usage patterns (e.g., PPG sampling primarily during sleep) constrain analyses. - Distal temperature sensors are non-calibrated and influenced by context; results depend on individualized normalization and may not transfer directly across devices or populations without calibration and personalization. - Potential biases due to sample demographics and participation; need for diverse, representative cohorts to mitigate algorithmic bias.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Predicting deterioration in dengue using a low cost wearable for continuous clinical monitoring

D. K. Ming, J. Daniels, et al.

Environmental Studies and Forestry

Air quality and attributable mortality among city dwellers in Kampala, Uganda: results from 4 years of continuous PM<sub>2.5</sub> concentration monitoring using BAM 1022 reference instrument

L. M. Atuyambe, S. Etajak, et al.

Medicine and Health

Development of digital measures for nighttime scratch and sleep using wrist-worn wearable devices

N. Mahadevan, Y. Christakis, et al.

Medicine and Health

Development of prediction models for screening depression and anxiety using smartphone and wearable-based digital phenotyping: protocol for the Smartphone and Wearable Assessment for Real-Time Screening of Depression and Anxiety (SWARTS-DA) observational study in Korea

Y. Shin, A. Y. Kim, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny