Medicine and Health

Detection and monitoring of stress using wearables: a systematic review

A. Pinge, V. Gad, et al.

Wearable devices are reshaping mental health monitoring by using continuous sensor data to detect and track stress early. This systematic review examines the sensors and devices used, plus steps like data collection, preprocessing, feature computation, and model training, and outlines future research on stress management. Research conducted by Anuja Pinge, Vinaya Gad, Dheryta Jaisighani, Surjya Ghosh, and Sougata Sen.... show more

Introduction

The paper addresses the need for continuous, objective stress detection and monitoring leveraging wearable devices and machine learning. Traditional stress assessments (questionnaires, interviews, cortisol/hormone assays) provide momentary snapshots and suffer from subjectivity, burden, and intrusiveness. Smartphones and wearables enable continuous sensing of physiological stress responses (elevated heart rate, sweating/EDA, skin temperature changes, respiration) and facilitate ecological momentary assessments (EMAs) for in-situ ground truth. The review focuses on field studies using wearables and ML for stress detection/monitoring, excluding works relying solely on public datasets, to illuminate practical steps (data collection, preprocessing, feature engineering, modeling, and evaluation). It formulates research questions on: (Q1) which physiological signals/wearables are used and related challenges; (Q2) data collection methods; (Q3) ML pipeline methods; and (Q4) limitations and future directions.

Literature Review

Prior surveys and reviews have examined wearable-based stress detection but with gaps: reviews focusing on public datasets (e.g., WESAD-centric analyses), on specific sensors (e.g., EDA-only), or lacking detailed coverage of device types, preprocessing steps, feature computation, and ML/methodological specifics. This review uniquely provides fine-grained details of sensors in wearables, preprocessing pipelines, features, ML techniques, metrics, and end-to-end workflows from field studies involving user data collection, addressing an identified gap in methodological granularity.

Methodology

Design: Systematic review following PRISMA guidelines. Databases: Google Scholar, ACM Digital Library, PubMed, IEEE Xplore. Search terms: combinations of "wearables", "automated", "stress", "monitoring", "detection". Search timing: First search August 2023; second search December 2023; studies after December 2023 not included. Selection: Top 50 results per keyword group per database, yielding 400 records (100 per database). Duplicates removed (66), leaving 334. Excluded: published before 2018 (122), non-human studies (13), not utilizing wearables (24), not on stress detection/monitoring (71), age outside 18–60 (16), studies using publicly available datasets without user studies (28), survey/review papers (21). After exclusions, 39 articles were included. Inclusion criteria: Peer-reviewed (2018 onward), human participants aged 18–60, wearable-based physiological stress detection/monitoring, field/user studies, machine learning or analytical approaches with collected data. Data extracted: Wearable devices and sensors used; data collection settings (lab stressors vs free-living); ground truth methods; preprocessing techniques; features; ML models; performance metrics.

Key Findings

Physiological signals: Heart activity (ECG, PPG; HR, HRV, IBI, BVP), electrodermal activity (EDA/GSR; phasic SCR and tonic SCL), respiration rate, skin temperature, accelerometer/activity, and less commonly SpO2, blood pressure, EEG, EMG.
Wearables: Both off-the-shelf and custom devices are used; off-the-shelf dominate (~85%), primarily chest- and wrist-worn. Notable devices include chest straps (Polar H7/H10, Zephyr BioHarness 3, Shimmer 3 ECG, Autosense, BIOPAC) and wristbands/watches (Empatica E4, Fitbit, Garmin Vivosmart 4, Samsung Gear Sports). Figure data indicates usage shares such as Empatica E4 (~35%), Samsung Gear Sports (~7.5%), and several chest devices each ~4–5%.
Data collection: Lab stressors include TSST (speech + mental arithmetic), Stroop Color-Word Test, memory search, horror/affect-inducing videos, acoustic stressors (air horn), treadmill walking, and cold pressor tasks; free-living studies range from hours to weeks/months, enabling naturalistic monitoring but with adherence/technical challenges.
Ground truth: Lab protocols label stressor periods as "stressed"; free-living uses EMAs (e.g., DASS, STAI, PSS) or custom prompts. Response burden, recall bias, and residual stress complicate labels; incentives and protocol adjustments are used.
Preprocessing: Common filters for ECG/PPG include Butterworth, FFT, DWT, Chebyshev II, bandpass and smoothing; peak detection for HR derivation; EDA separated into phasic/tonic with filtering; outlier removal and interpolation (linear, cubic spline); normalization; time-to-frequency transforms (FFT) for spectral features.
Features: Time-domain (mean, median, min, max, SD of HR/RR), HRV metrics (RMSSD, SDSD, NN50/PNN50, SDNN, triangular index, TINN), frequency-domain (VLF, LF, HF, LF/HF, total power, energy, pLF/pHF), EDA statistical and SCR/SCL-derived features (startle counts, amplitudes, rise/fall times), respiration statistics, accelerometer axes/magnitude, and others (blood pressure/oximeter).
ML techniques: Classical ML—Random Forest, XGBoost, SVM, KNN, Naive Bayes, Logistic/Linear Regression; Deep Learning—MLP, LSTM (for temporal dynamics), CNN. Random Forest and XGBoost frequently yield strong performance; deep learning is emerging but data/computation needs are noted.
Metrics: Accuracy for balanced datasets; Precision/Recall/F1-score for imbalanced data; Sensitivity/Specificity and ROC-AUC.
Performance examples: RF often 76.5–88.2% accuracy; SVM sometimes better than RF in specific setups (e.g., 82% vs 80%); Decision Trees reported up to 95% in one study; highlights dependence on dataset, features, and context.

Discussion

Findings demonstrate that wearable-based physiological sensing, coupled with machine learning, can detect and monitor stress across lab and real-life contexts. The review clarifies end-to-end pipelines, common sensors/devices, and effective preprocessing/feature strategies, addressing Q1–Q3. Significance: It highlights practical trade-offs—accuracy vs comfort (ECG chest straps vs PPG wrist devices), sensor availability vs cost, and on-device vs offloaded computation—informing study design. Ground truth challenges (EMA burden, residual stress) directly affect model validity; human-in-the-loop strategies and micro-EMAs may reduce burden. Random Forest/XGBoost are robust baselines; LSTM/CNN show promise for temporal and representational learning but require careful evaluation of gains vs complexity and data demands. Relevance: The synthesis aids researchers in selecting devices, signals, and ML methods for target populations and contexts, and underscores the need for real-time in-situ detection to enable stress management (e.g., JITAI). It also warns against over-reliance on public datasets due to demographic and modality constraints, encouraging diverse, longitudinal field studies.

Conclusion

This systematic review consolidates state-of-the-art wearable-based stress detection and monitoring methods, detailing devices/sensors, data collection (lab/free-living), preprocessing, features, machine learning techniques, and evaluation metrics across 39 field studies. It provides a methodological blueprint for new researchers, identifies robust approaches (e.g., RF/XGBoost, HRV and EDA features), and highlights emerging deep learning opportunities. Future work should pursue: real-time, on-device detection; improved, low-burden ground truth via human-in-the-loop and micro-EMAs; enhanced usability and social acceptability of wearables; comprehensive studies that integrate context and interventions (JITAI) to support stress management; and broader, diverse datasets to improve generalizability.

Limitations

The review excludes studies prior to 2018 and those using only publicly available datasets, potentially omitting historical context and benchmarking insights. It does not assess clinical acceptability of devices, user acceptability/usability (battery life, heating, comfort), or privacy/security of data and models. Theoretical pros/cons of ML models are not extensively analyzed. Most included works focus on offline analysis; onboard computational capabilities for real-time detection are not detailed.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Detection of cerebrospinal fluid biomarkers changes of Alzheimer's disease using a cognitive stress test in persons with subjective cognitive decline and mild cognitive impairment

M. Valles-salgado, M. J. Gil-moreno, et al.

Psychology

The effects of mindfulness-based interventions on anxiety, depression, stress, and mindfulness in menopausal women: A systematic review and meta-analysis

Hl, Hz, et al.

Medicine and Health

Artificial intelligence in mental health care: a systematic review of diagnosis, monitoring, and intervention applications

P. Cruz-gonzalez, A. W. He, et al.

Medicine and Health

Physical Activity and Depression and Anxiety Disorders: A Systematic Review of Reviews and Assessment of Causality

M. N. Wanjau, H. Möller, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny