Medicine and Health
Using smart speakers to contactlessly monitor heart rhythms
A. Wang, D. Nguyen, et al.
The study addresses whether commodity smart speakers can be transformed into contactless sensors to acquire beat-to-beat cardiac rhythms (R–R intervals), including irregular rhythms such as atrial fibrillation, with clinically useful accuracy. Conventional ECG accurately measures R–R intervals but requires skin contact, which can be inconvenient for contagious or quarantined patients, those with skin sensitivities, and for telemedicine use. Modern smart speakers have multiple microphones and can emit inaudible acoustic signals, creating an opportunity for contactless monitoring without cameras and associated privacy concerns. The authors propose an active sonar approach in which a smart speaker emits 18–22 kHz signals and analyzes the echo from the chest to detect sub-millimeter surface motions caused by heartbeats. The goal is to extract individual beats and compute heart rate and R–R intervals for healthy individuals and hospitalized patients with both regular and irregular rhythms, thereby enabling applications in arrhythmia screening and HRV analysis.
Prior contactless methods include Doppler radar and optical cardiography, which require specialized hardware not ubiquitous in homes or hospitals. Camera-based facial photoplethysmography can estimate pulse but raises privacy concerns. Recent smart device sonar work has extracted respiration and, over short distances, assumed regular heart rhythms to estimate average heart rate via frequency-domain peaks. However, frequency-domain methods fail with irregular rhythms where energy is spread across frequencies. Commodity smart speakers also pose challenges: limited inaudible bandwidth (18–22 kHz), 48 kHz sampling rates, and lower SNR compared to ultrasonic systems. Breathing motions are large, non-sinusoidal, and introduce harmonics in heart-rate bands, complicating isolation of heartbeat-induced motion. These limitations motivate new signal processing—beyond simple filtering—to separate heartbeat motion from respiration and noise and to handle irregular rhythms.
System and signal processing: The smart speaker system emits a linear FMCW chirp sweeping 18–22 kHz with 50 ms duration, repeated continuously, recorded at 48 kHz via a seven-microphone array. Processing steps include: (1) filtering to remove audible frequencies; (2) converting received FMCW signals to an impulse response per microphone via frequency-domain equalization; (3) echo suppression using a raised-cosine window to attenuate reflections beyond ~1 m, improving sensitivity to small chest motions; (4) adaptive learning-based beamforming to maximize SINR of the heartbeat signal. The beamformer applies complex weights across microphones and frequencies, optimized via gradient ascent to align heartbeat components while minimizing interference from respiration and ambient noise. Regularization terms penalize abrupt, impulse-like changes (e.g., sudden breaths or motion) and stabilize the heartbeat envelope. Filter banks separate lower-frequency respiration (<~60 cpm) and higher-frequency heartbeat bands (~60–150 cpm) to guide the objective. (5) Segmentation of the complex beamformed signal into individual beats using a segment-to-segment comparison that accounts for residual respiration-induced rotation between in-phase and quadrature components and variable R–R intervals. The algorithm identifies segment boundaries and computes per-beat timing from segment midpoints to derive heart rate and R–R intervals. (6) Data stream synchronization aligns acoustic-derived beats with ground-truth ECG/PPG timing, with manual offset correction and error-correcting matching; unmatched intervals are analyzed with interpolation in sensitivity analyses.
Hardware prototype: Due to lack of third-party access to raw microphone channels on commercial devices, the authors built a prototype with a UMA-8 SP seven-microphone array and a PUI Audio A5003B-3 speaker arranged with ~3.4 cm spacing (similar to Amazon Echo Dot geometry). Output SPL was ~75 dB at 50 cm (approx. 66 dB(A) at 20 kHz). The system’s sampling rate was 48 kHz.
Participants and study protocols: Healthy cohort (N=26): Adults without cardiac history underwent seven 60-s sessions: distances 40, 50, 60 cm (inline, nipple level); 50 cm with speaker 10 cm above chest; 50 cm at 20° angle; background jazz at 75 dB(A); and post-exercise (>110 BPM), with normal breathing otherwise. Ground truth: Polar H10 chest ECG providing HR and R–R intervals. Hospitalized cardiac patients (N=24): Enrolled from an acute care cardiology unit, adjudicated to regular vs irregular rhythm categories by a cardiologist. Excluded if BMI>35 for main analysis (a separate obesity analysis included five patients with BMI 36–40). Ground truth: Polar H10 when feasible; otherwise CorSense finger PPG due to telemetry export limitations. Patients were seated upright in bed; smart speaker placed ~50–60 cm, inline at nipple level; 5×60 s sessions; ambient noise controlled. Study design: Prospective, no randomization or blinding. Statistical analysis included ICC and CCC for agreement, MAE and percentile errors for HR and R–R intervals, and condition-specific analyses (distance, angle, noise, exercise, sex, BMI).
Healthy participants (N=26):
- Heart rate: Intraclass and concordance correlation coefficients (ICC/CCC) = 0.983/0.983. Mean absolute error (MAE) = 1 BPM; 90th percentile error < 4 BPM; bias ~0.12 BPM; LOA ~94.2%; N sessions = 156.
- R–R intervals: ICC = 0.929, CCC = 0.927; MAE = 28 ms (SD 49 ms); 90th percentile error = 75 ms; bias ~ -1.02 ms; N beats = 12,280. Mean absolute percentage error ~3.6% (SD 4.3%).
- Condition effects: Increasing distance 40→60 cm increased median R–R error from 25→33 ms. Tilting 10 cm above chest: median error ~26 ms; 20° angle: ~31 ms. Background music increased median error 25→32 ms. Post-exercise median error ~32 ms vs 25 ms at rest. Slightly higher errors with higher BMI (trend) and among females (median ~30 ms vs ~27 ms in males). Algorithm tolerated lateral placement; performance degraded when speaker was behind participant.
Hospitalized cardiac patients (N=24):
- Heart rate: MAE = 2 BPM; 90th percentile error < 8 BPM.
- R–R intervals: ICC = 0.901, CCC = 0.898; MAE ≈ 30 ms (SD 67.2 ms). Percent absolute error median ~4.0% (SD 7.6%).
- Irregular rhythms (including AF): Mean absolute R–R error ~35 ms; ICC/CCC ~0.891/0.890. No noticeable decrease in accuracy for irregular vs regular rhythms. In AF, large beat-to-beat variability (SDs ~95–233 ms) makes small measurement errors (<50 ms) clinically less significant for rhythm identification.
Extreme obesity (BMI 36–40; n=5): Usable cardiac rhythm extraction in 3/5; likely attenuation of chest surface motion due to excessive adipose tissue, consistent with known limitations in echocardiography and optical vibrocardiography for severe obesity.
Overall: The system reliably identified individual heartbeats and computed HR and R–R intervals contactlessly at sub-meter distances in both healthy and cardiac patient cohorts, including irregular rhythms.
The findings demonstrate that commodity-like smart speaker hardware, combined with tailored sonar signal processing and adaptive beamforming, can noninvasively capture individual cardiac beats and compute HR and R–R intervals with high agreement to ECG/PPG references. Crucially, unlike frequency-domain methods that assume periodicity, the proposed time-domain segmentation handles irregular rhythms, supporting potential use in atrial fibrillation detection and HRV assessment. The approach is resilient to common real-world variations (moderate changes in distance/angle, moderate ambient noise) and shows only modest degradation post-exercise.
Clinically, contactless monitoring could reduce burden in settings where electrode placement is impractical (e.g., infectious isolation, skin sensitivity), enable remote telemedicine assessments, and provide opportunistic screening using already-deployed smart speakers in homes and hospitals. Privacy is addressed by the short-range, inaudible active sonar which does not capture intelligible audio and operates only with explicit user proximity. Differences between healthy and hospitalized populations (e.g., vascular stiffness with age/hypertension, medications affecting autonomic tone) may reduce chest surface motion and HRV, but the system still achieved clinically acceptable errors. Limitations include reduced performance in severe obesity and the need for user stillness within ~1 m, suggesting careful deployment guidelines. Overall, the results support feasibility and motivate integration into consumer devices with appropriate privacy controls.
This proof-of-concept converts a smart speaker into a short-range active sonar to contactlessly measure heart rate and beat-to-beat R–R intervals, including irregular rhythms such as atrial fibrillation. Across healthy participants and hospitalized cardiac patients, the system achieved low MAE in HR (1–2 BPM) and R–R intervals (~28–35 ms) with high ICC/CCC. The method is robust to moderate variations in placement, angle, and background noise, and shows promise for applications in telemedicine, screening, and monitoring of patients where contact sensors are undesirable. Future work includes optimizing for diverse body types (particularly severe obesity), improving robustness during motion or post-exertion, broader validation with gold-standard clinical ECG across larger and more diverse cohorts, and integration into commercial smart speakers once access to raw microphone arrays is enabled by manufacturers with strong privacy protections.
- Hardware and access: Prototype used third-party microphone arrays because consumer smart speakers do not expose raw multi-microphone signals to developers. Commodity devices have limited bandwidth (18–22 kHz) and 48 kHz sampling, constraining SNR and performance.
- User and environmental constraints: Requires subject within ~1 m, relatively still, facing the speaker; performance degrades with increased distance, behind-body placement, high ambient noise, and post-exercise motion/noise.
- Population: Main patient cohort excluded BMI >35; in a small severe-obesity subset (BMI 36–40), only 3/5 yielded usable signals, limiting generalizability to this group.
- Ground truth: Some hospitalized patients used finger PPG (CorSense) rather than continuous clinical ECG due to telemetry export limitations, introducing potential reference error.
- Study design: Prospective but non-randomized, non-blinded; relatively small single-center cohorts; short 60-s sessions.
- Safety/audibility: 18–22 kHz is generally inaudible to adults but can be audible to younger individuals; pet sensitivity to very high ultrasonics is noted (though 18–30 kHz not reported to affect animals).
Related Publications
Explore these studies to deepen your understanding of the subject.

