logo
ResearchBunny Logo
Pulse oximetry values from 33,080 participants in the Apple Heart & Movement Study

Medicine and Health

Pulse oximetry values from 33,080 participants in the Apple Heart & Movement Study

I. Shapiro, J. Stein, et al.

This remarkable study led by Ian Shapiro, Jeff Stein, Calum MacRae, and Michael O'Reilly dives into a comprehensive analysis of 72 million SpO₂ values from a diverse cohort of participants. Discover how demographic factors influence SpO₂ patterns and learn about the established healthy population norms for oxygen saturation.... show more
Introduction

The study investigates population-level patterns and determinants of blood oxygen saturation (SpO₂) measured noninvasively via wearable reflectance pulse oximetry in free-living conditions. Arterial oxygen saturation is a key indicator of cardiopulmonary function but direct SaO₂ measurement requires invasive sampling; pulse oximetry offers a practical surrogate. Prior work has shown typical healthy SpO₂ values around 95–99% at sea level, with hypoxemia generally considered below 90%, and single-point measures below 95% prompting further evaluation. Existing cross-sectional studies have consistently found that oxygen saturation declines with increasing age and body mass and increases with barometric pressure/altitude-adjusted pressure; findings for sex and smoking effects have been mixed. Lower SpO₂ has been associated with increased cardiopulmonary risk and adverse outcomes in longitudinal studies. This study aims to quantify 24-hour circadian profiles and daytime versus nocturnal SpO₂, and to model their associations with age, BMI, sex, altitude, and race/ethnicity in a large, diverse cohort of Apple Watch users, thereby establishing population norms and comparing wearable-derived estimates with established laboratory reference models.

Literature Review

Prior cross-sectional research has shown negative correlations of SaO₂/SpO₂ with age and BMI, and positive associations with barometric pressure consistent with the alveolar gas equation. Reports on sex differences have been inconsistent (some positive association for females, others null or negative), as have findings regarding smoking (some lower SpO₂ in current smokers, others not significant). Single-point low SpO₂ has been linked to elevated long-term cardiopulmonary risk (e.g., Tromsø Study showing higher risk for SpO₂ <92% and 93–95% vs 96–100%), hypertension prediction from daytime SpO₂, relations to impaired left ventricular filling, and to blood pressure metrics including morning surges; associations of nocturnal SpO₂ with atherosclerotic cardiovascular disease are mixed. Limited prior work has characterized full circadian SpO₂ profiles in healthy adults; studies in children and high-altitude young adults describe sinusoidal daily patterns with nocturnal nadir and midday peaks, aligning with circadian variation seen in pulmonary function tests.

Methodology

Design and data source: Cross-sectional analysis of SpO₂ data from the Apple Heart and Movement Study (start Nov 14, 2019), conducted with the American Heart Association and Brigham and Women’s Hospital; IRB-approved (Advara) and registered (NCT04198194). Participants were Apple Watch users ≥18 years residing in the U.S. who consented via the Apple Research app.

Inclusion/exclusion and cohort size: From 151,935 participants, those with incomplete demographics/location metadata were excluded (n=8,092). Only users of Apple Watch Series 6 with SpO₂ data between Jan 1, 2021 and Sept 15, 2021 were included; others excluded (n=92,062). Participants with fewer than 30 daytime and 300 nocturnal SpO₂ values were excluded (n=18,731). Final analytic cohort: 33,080 participants providing ~72.2 million measurements.

Data collection: SpO₂ measured using Apple Watch Series 6 reflectance pulse oximetry sensor via on-demand and passive background measurements (approximately every 30 minutes under low-motion conditions). Demographics (age, sex assigned at registration, race/ethnicity), height, weight, and location were self-reported in the app. Geographic information used ZIP code aggregates for privacy; approximate home altitude derived by mapping ZIP codes to USGS mean surface elevation; barometric pressure estimated via a reduced NASA pressure-altitude equation. BMI was calculated from height and weight. Due to small numbers, several race/ethnicity categories were combined into an 'Other' group for subgroup analyses.

Data processing: For each subject, SpO₂ values were aggregated by hour of day to produce a 24-h profile independent of each subject’s measurement count or timing. Per-subject mean daytime SpO₂ (dSpO₂) was defined as the average of values between 11:00–18:59 local time; mean nocturnal SpO₂ (nSpO₂) as the average between 01:00–04:59. The day-night difference (dnSpO₂) was computed as dSpO₂ − nSpO₂.

Statistical analysis: Ordinary least squares (OLS) regression modeled dSpO₂ and nSpO₂ separately. Reference comparisons were made to prior SaO₂ models (e.g., Crapo et al.), using a reference model (Mref) with age, height, weight, sex, and inferred barometric pressure; a univariate low-altitude comparison using age only was also performed. The primary full-cohort model (M1) included linear terms for age (centered at 40 years), BMI (centered at 25 kg/m²), home altitude (centered), assigned sex (male=1, female=0), and race/ethnicity (categorical, White as reference). Stratified models were fit by sex (M1,sex) and by race/ethnicity groups. Interaction-term models were explored but did not improve fit relative to stratified analyses. Goodness-of-fit and coefficient significance were assessed; Welch’s unequal variances t tests with Bonferroni correction (p<0.0005) were used to compare coefficients across stratified models. Confidence intervals were generally 95%, and 99.5% CIs were used for circadian profile error whiskers. Mixed-effects variance components analyses (Supplementary) supported that subject-to-subject differences dominate variance. Analyses were conducted in Python (Matplotlib, Seaborn) with specified versions; code available upon request subject to protocol constraints.

Key Findings
  • Scale and pattern: Across all demographic strata, 24-hour SpO₂ profiles were approximately sinusoidal, with nadir near midnight to early morning (around 01:00) and zenith near midday (around 11:00–12:00). The full cohort showed a mean diurnal range of about 1% saturation; nocturnal values were on average ~0.8–0.9% lower than daytime.
  • Descriptive statistics: Mean (±SD) nSpO₂ was 95.6% ± 1.3; mean dSpO₂ was 96.2% ± 1.1. The cohort included 33,080 subjects contributing ~72.2 million measurements.
  • Age, BMI, altitude effects: In full-cohort linear models (M1), age, BMI, and home altitude coefficients were highly significant for both dSpO₂ and nSpO₂. SpO₂ decreased with increasing age and BMI and increased with higher barometric pressure (lower altitude). The age-dependent decline measured for the cohort (approximately −0.031% SpO₂ per year) closely matched prior reports (~−0.036%, −0.027%, and −0.020% per year in earlier studies). Effect sizes were generally larger and model fit (R²) higher for nocturnal SpO₂ than daytime.
  • Sex effects: In the nocturnal model, males had 0.16% higher nSpO₂ than females (99.5% CI 0.11–0.22, P=2.7×10⁻⁶). In the daytime model, females had 0.05% higher dSpO₂ than males (99.5% CI 0.01–0.09, P=4.1×10⁻⁷). Sex-stratified models showed significantly different slopes by sex: females exhibited a greater decline with age, while males showed a larger decrease with increasing BMI. Constant terms and altitude coefficients did not differ significantly between sexes.
  • Race/ethnicity: No categorical race/ethnicity variables were significant in nocturnal models. In daytime models, small but significant differences were detected; the largest indicated approximately 0.13% lower dSpO₂ for Hispanic participants relative to White participants. Additional subgroup models suggested some differences in age- and sex-related slopes between groups, but no significant differences in constant terms for either daytime or nocturnal SpO₂ across race/ethnicity groups.
  • Concordance with reference models: Daytime SpO₂ regression coefficients were in close quantitative agreement with published reference SaO₂ models measured under controlled conditions, supporting validity of wearable-derived estimates in free-living settings.
  • Additional observations: Subgroups with lower daytime SpO₂ tended to show a larger day-night drop. The three strongest predictors (age, BMI, altitude) also predicted the day-night difference (dnSpO₂). R² values were higher for nocturnal than daytime models, indicating better explained variance at night.
Discussion

The findings demonstrate consistent circadian SpO₂ patterns across diverse demographic groups, aligning with prior limited literature on diurnal oxygenation rhythms and with known circadian variation in pulmonary function. Wearable reflectance pulse oximetry in naturalistic settings reproduced expected associations: lower SpO₂ with advancing age and higher BMI, and higher SpO₂ at higher barometric pressures (lower altitudes). Stronger effect sizes and higher model fit during nocturnal windows suggest that sleep conditions reduce behavioral confounds and accentuate physiological drivers, making nocturnal SpO₂ particularly informative for detecting systematic differences.

Sex-stratified analyses revealed novel differences in age- and BMI-dependent trends: females showed a steeper age-related decline, while males exhibited a stronger adverse association with increasing BMI. Potential mechanisms include sex-related differences in fat distribution (e.g., greater visceral adiposity in males affecting ventilatory mechanics) and hormonal influences on pulmonary function. Race/ethnicity differences were minimal and limited to small daytime effects, with no significant nocturnal differences and no evidence of clinically meaningful bias in high-saturation ranges; however, limited hypoxic data precluded assessment of potential biases at low saturations.

Overall, the results support the use of large-scale wearable SpO₂ data to establish healthy population norms and to replicate laboratory-based relationships under real-world conditions. The prominence of nocturnal effects underscores the potential of sleep-period measurements for risk stratification and monitoring.

Conclusion

This study establishes population-level norms and determinants of SpO₂ using more than 72 million measurements from 33,080 Apple Watch users, revealing robust circadian patterns and confirming known associations with age, BMI, and barometric pressure. Daytime wearable-derived regression coefficients closely matched laboratory reference models, and nocturnal measurements provided stronger model fits and larger effect sizes. Small daytime differences by race/ethnicity were detected, with no significant nocturnal differences; sex-stratified analyses uncovered distinct age and BMI trends by sex. These findings highlight the value of wearable SpO₂ for scalable, real-world cardiopulmonary assessment and suggest that nocturnal measurements may be particularly informative. Future work should investigate unexplained components of nocturnal SpO₂ decline, incorporate precise sleep–wake alignment and clinical validation in hypoxic ranges, examine disease-specific subgroups, and further assess potential measurement biases across skin tones and at low saturation levels.

Limitations
  • Demographic imbalances and self-selection within the cohort; self-reported metadata (age, height, weight, sex, race/ethnicity, location) were not independently verified.
  • No exclusions for cardiopulmonary risk factors or chronic conditions (e.g., COPD, sleep apnea), smoking, or other behaviors; COVID-19 pandemic during data collection may have introduced acute illness effects.
  • Uncontrolled, naturalistic measurement conditions with unknown covariates; potential mixing of awake and asleep measurements within defined time windows, though hour-specific modeling showed stable coefficients.
  • Limited hypoxic data (2.5% of values <90%, 0.29% <85%), constraining assessment of device behavior and potential bias in low-saturation ranges.
  • Home altitude inferred from aggregated ZIP codes; potential misclassification of barometric pressure and altitude.
  • Apple Watch Series 6 is not a co-oximeter and cannot account for dyshemoglobins (carboxyhemoglobin, methemoglobin, sulfhemoglobin).
  • Use of a single-day window per participant in some analyses may introduce bias; reducing the window did not materially change results but residual biases may remain.
  • Small counts in certain race/ethnicity categories necessitated grouping into 'Other,' limiting subgroup granularity.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny