Psychology

Differential temporal utility of passively sensed smartphone features for depression and anxiety symptom prediction: a longitudinal cohort study

C. A. Stamatis, J. Meyerhoff, et al.

A large-scale smartphone passive-sensing study (n = 1013) using the LifeSense app tracked GPS, app/device use, and communication over 16 weeks to identify digital markers of depression and anxiety. Spending more time at home relative to one’s average predicted future depressive symptoms, while circadian movement was only a proximal correlate. Research conducted by Authors present in <Authors> tag.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses how passively sensed smartphone data relate to depression and anxiety symptoms with attention to temporal scale (data window and lag), symptom specificity (depression vs. anxiety), and within-person versus between-person effects. Prior work suggests associations between communication patterns, language in texts, keystroke dynamics, and GPS-derived mobility with affective symptoms, but replication in larger, heterogeneous samples and clarity on optimal time windows/lags are needed. The primary objective was to evaluate smartphone sensor-based markers that prospectively relate to depression and anxiety severity, examining within- and between-person associations across shifting 2-week sensor windows predicting symptoms at distal (2-week lag), medial (1-week lag), and proximal (0-week lag) times.

Literature Review

Existing studies link smartphone-derived features (e.g., number/type of calls/texts, text message content, keystroke patterns) and GPS-based mobility indicators with affective symptoms. However, replication and generalizability issues persist, with calls for larger, more diverse samples. Temporal characteristics vary across studies, with some using 24-hour windows to predict outcomes within hours or a day, and others using multi-week windows and lags. Prior work by the authors found a 4-week data window optimal for predicting depression from text message language, while social media studies suggested 2-month windows with 2–4-week lags. These mixed findings underscore the importance of systematically evaluating data window and lag when relating sensed features to mental health outcomes.

Methodology

Design: Longitudinal cohort study over 16 weeks with continuous passive sensing via the LifeSense app and repeated symptom assessments. Participants: 1,093 enrolled across three recruitment waves (2019–2021); data available from 1,013 participants (74.6% female; mean age 40.9 years, SD 12.7). Inclusion: ≥18 years, U.S. resident, English-reading, Android smartphone with active data/text; stratified sampling for depression severity (PHQ-8 ≥10 for at least 50% in Waves 1–2; all PHQ-8 ≥10 in Wave 3). Exclusion: self-reported bipolar disorder, manic/hypomanic episode, schizophrenia, or other psychotic disorder. Participants compensated up to $142 plus adherence bonuses. Sensing: Continuous passive collection of GPS-based data, app and device use, and communication metadata. Sensor features were clustered and aggregated over 2-week windows. Assessments: PHQ-8 via app at the beginning and end of every third week (modified to past-week timeframe due to cadence). GAD-7 via REDCap at baseline and every 3 weeks (weeks 1, 4, 7, 10, 13, 16; standard past-2-weeks instructions). Total scores collected: PHQ-8 (n=4731; 6.59% missing of 5065), GAD-7 (n=4649; 8.21% missing of 5065). Analytic approach: Multilevel linear regression (R, lmerTest, maximum likelihood). For each sensor predictor, models included both between-person (person mean) and within-person (person-mean-centered deviation) terms. Covariates: study week (centered), age (centered), gender, urbanicity/rurality; random intercepts. Three lagged models per outcome used a sliding 2-week sensor window: distal prediction (2-week lag), medial prediction (1-week lag), and proximal prediction (0-week lag). Distal/medial windows had no overlap with symptom reporting; proximal included the week concurrent with reporting. Model R² reported for each lag.

Key Findings

- Location features and PHQ-8 (depression): - Within-person increases in home duration predicted higher PHQ-8 across all lags: distal β=0.219, p=0.012; medial β=0.198, p=0.022; proximal β=0.183, p=0.045; no significant between-person effects or associations with GAD-7. - Greater between-person GPS variability and mobility associated with lower next-week PHQ-8 (medial β=-0.503, p=0.046); not significant for distal (β=-0.464, p=0.073) or proximal (β=-0.424, p=0.093). - Within-person time in more frequently visited venues associated with lower PHQ-8 for medial (β=-0.185, p=0.003) and proximal (β=-0.168, p=0.007) windows; not distal (β=-0.064, p=0.308). - Within-person circadian movement associated with lower proximal PHQ-8 (β=-0.131, p=0.035); not predictive at distal (β=0.034, p=0.577) or medial (β=-0.089, p=0.138) lags. - Communication features: - Within-person app-based messaging time associated with higher proximal PHQ-8 (β=0.162, p=0.015); not significant at distal (β=0.059, p=0.385) or medial (β=0.115, p=0.083). - Between-person app-based messaging associated with higher GAD-7 for distal (β=0.486, p=0.041) and medial (β=0.481, p=0.046); proximal non-significant (β=0.466, p=0.053). No significant within-person app-based messaging effects on GAD-7. - Within-person increases in call/text communication associated with higher GAD-7 across all lags: distal β=0.279, p=0.005; medial β=0.386, p<0.001; proximal β=0.293, p=0.003. No significant associations with PHQ-8. - Other phone use: - Between-person greater screen-on time associated with higher PHQ-8 at distal (β=0.503, p=0.016) and proximal (β=0.541, p=0.012) lags; medial not significant (β=0.272, p=0.196). - Launcher use associated with lower PHQ-8 between-person across all lags (distal β=-0.596, p=0.008; medial β=-0.525, p=0.018; proximal β=-0.653, p=0.004) and within-person proximally (β=-0.161, p=0.023); not associated with GAD-7. - Demographics and time: - Younger age and female gender associated with higher PHQ-8 and GAD-7 (age β range approximately -0.573 to -1.163; p≤0.001; male vs. female β range approximately -0.360 to -0.563; p≤0.036). - Rural residence associated with higher GAD-7 (β≈ -0.520 to -0.532 for urban vs. rural; p=0.002); not significant for PHQ-8. - Symptom severity decreased over study weeks for both PHQ-8 and GAD-7 (β range -0.107 to -0.183; p<0.001). - Model fit: Modest explained variance: PHQ-8 R² distal=0.049, medial=0.048, proximal=0.053; GAD-7 R² distal=0.058, medial=0.056, proximal=0.057.

Discussion

Findings demonstrate differential temporal and symptom-specific utility of smartphone-sensed features. Within-person increases in time at home function as a robust early signal of rising depressive symptoms across distal to proximal windows, supporting their potential as targets for proactive, individualized interventions (e.g., behavioral activation). In contrast, circadian movement changes align with concurrent or near-term depression rather than prospectively predicting future symptoms, suggesting utility for detecting imminent symptom burden rather than early warning. Communication modalities show distinct relationships: app-based messaging relates to impending depression and between-person anxiety severity, while within-person increases in calling/texting consistently signal higher anxiety across time windows. These patterns highlight that feature type and prediction lag critically shape clinical interpretability and use in just-in-time adaptive interventions. Features significant primarily at the between-person level (e.g., launcher use with PHQ-8, app-based messaging with GAD-7) are less informative for individualized change detection.

Conclusion

This large-scale smartphone sensing study identified that location features—especially within-person increases in home duration—are prospective markers of intra-individual changes in depression severity, while communication features (app-based messaging, calls/texts) relate to both depression and anxiety with modality- and lag-specific patterns. The multilevel, longitudinal approach clarifies which signals are early indicators versus concurrent correlates, informing personalization strategies for digital mental health and clinical decision support. Future work should improve predictive performance (e.g., via machine learning), vary data windows and lags, assess prediction accuracy metrics, test causal interventions targeting sensed constructs, and evaluate generalizability across more diverse populations and contexts.

Limitations

- Modest explained variance (approximately 5–6% across outcomes and lags) relative to early sensing studies. - Correlational design despite lagging sensors and assessments; no causal inferences can be made. - Fixed 2-week sensor data window; other windows may alter predictive performance. - Potential impact of missing data and its temporal dynamics not fully explored. - COVID-19–related environmental changes during data collection may have attenuated associations (e.g., routines, mobility). - Differences in delivery and reference periods for GAD-7 (REDCap; past 2 weeks) versus PHQ-8 (in-app; past week) may affect responses. - Limited demographic diversity; generalizability to broader populations requires further study.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Development of prediction models for screening depression and anxiety using smartphone and wearable-based digital phenotyping: protocol for the Smartphone and Wearable Assessment for Real-Time Screening of Depression and Anxiety (SWARTS-DA) observational study in Korea

Y. Shin, A. Y. Kim, et al.

Psychology

The Prevalence and Incidence of Suicidal Thoughts and Behavior in a Smartphone-Delivered Treatment Trial for Body Dysmorphic Disorder: Cohort Study

A. C. Jaroszewski, N. Bailen, et al.

Medicine and Health

Risk factors for and pregnancy outcomes after SARS-CoV-2 in pregnancy according to disease severity: A nationwide cohort study with validation of the SARS-CoV-2 diagnosis of Nordic Federation of Societies of Obstetrics and Gynecology (NFOG)

A. J. M. Aabakke, T. G. Petersen, et al.

Medicine and Health

A randomized, double-blind, active placebo-controlled study of efficacy, safety, and durability of repeated vs single subanesthetic ketamine for treatment-resistant depression

P. R. Shiroma, P. Thuras, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny