logo
ResearchBunny Logo
Mobile sensing-based depression severity assessment in participants with heterogeneous mental health conditions

Medicine and Health

Mobile sensing-based depression severity assessment in participants with heterogeneous mental health conditions

B. Lamichhane, N. Moukaddam, et al.

Mobile sensing augmented with free-living audio captures sociability signals that improve depression severity assessment across healthy and clinical populations. In a weeklong study with healthy, major depressive disorder, and schizoaffective participants, adding audioband-derived sociability features reduced RMSE from 6.80 to 6.07, boosted five-class F1 from 0.34 to 0.46, and achieved a strong correlation (r=0.76, p<0.001) with reported severity. Research was conducted by Bishal Lamichhane, Nidal Moukaddam, and Ashutosh Sabharwal.... show more
Introduction

Depression affects about 5% of adults globally and is associated with poor health outcomes and functional impairments. Routine assessment of depression severity can aid timely diagnosis and treatment monitoring, but commonly used tools like the Patient Health Questionnaire (PHQ-9) rely on subjective self-report and can suffer from recall bias. Objective, frequent, and unobtrusive assessments via mobile sensing (smartphone and wearable sensors) offer a complementary approach by capturing behavioral manifestations of depression, including sociability and mobility patterns. Prior studies have shown feasibility of mobile sensing for depression severity assessment using communication logs (calls/texts) and location-derived mobility features, but most were conducted in homogeneous populations (e.g., students or single-diagnosis clinical cohorts), limiting generalizability to heterogeneous, transdiagnostic settings. Moreover, free-living audio—a modality that can capture in-person social interactions—has been underexplored for depression severity prediction outside controlled lab settings. This study investigated (1) whether features extracted from smartphone communication logs and location data can model PHQ-9 depression severity across a heterogeneous cohort including healthy individuals and patients with depressive and psychotic disorders; and (2) whether sociability features derived from continuous free-living audio complement communication and location features to improve depression severity prediction.

Literature Review

Previous mobile sensing studies have demonstrated associations between depression severity and behaviors inferred from communication logs, sleep, activity, stress, eating, and mobility. However, many studies focused on relatively homogeneous groups (e.g., students with minimal to mild symptoms) or clinical samples restricted to a single primary diagnosis, limiting transdiagnostic generalization. Reviews have identified useful feature families (e.g., communication and mobility) but highlighted heterogeneity in feature definitions, study designs, and reporting that hampers meta-analysis. Some works suggested distinct features might be needed for clinical versus non-clinical populations, which complicates deployment of primary screening tools. Free-living audio has shown promise for capturing sociability, with prior associations observed between depression and conversational network size, social ambiance, and dyadic interaction patterns using audio and RFID. Speech-based markers under controlled settings (e.g., response time, acoustic/semantic features) have also correlated with depression, suggesting free-living audio could provide naturalistic social functioning measures relevant to depressive states. Large-scale work has questioned generalizability of GPS mobility features to broad populations, underscoring the need to test models in heterogeneous cohorts and explore complementary modalities like free-living audio.

Methodology

Study design and participants: Cross-sectional pilot with 32 participants (average age 45.2 ± 13.6 years; 19 female/13 male): 11 healthy controls, 13 with major depressive disorder (MDD), and 8 with schizoaffective disorders. All patients were stable for outpatient management. The cohort was racially/ethnically diverse (50% Black/African American, 21.88% Hispanic, 18.75% white). IRB approvals were obtained (Rice University, Baylor College of Medicine, Harris Health Systems; protocol H-41811), and participants gave written informed consent.

Data collection: Over one week per participant, a study smartphone app collected communication logs (incoming/outgoing calls and text messages: timestamps, call durations; phone numbers one-way hashed) and GPS location data (available for 19 participants). A wearable audioband recorded continuous daytime free-living audio (~8 AM–8 PM, ~12 hours/day) across 7 days.

Feature engineering:

  • Communication log features (calls, texts, and combined call+text): per-day averages computed over study period. For each modality, five features were extracted: (1) average daily number of communications; (2) asymmetry coefficient AC = (out − in)/(out + in); (3) skewness coefficient SC = sign(out − in)·(1 − H(out,in)), where H is Shannon entropy on incoming/outgoing proportions; (4) average incoming network size (unique contacts/day); (5) average outgoing network size (unique contacts/day). Additionally, average call duration/day. Total of 16 features (5 each for call, text, combined; plus call duration).
  • Mobility features from GPS (for 19 participants): GPS preprocessed to estimate instantaneous speed (Vincenty’s formula for distance). Stationary points identified via speed threshold of 1.4 m/s. Features: (1) average distance traveled/day (sum of distances at stationary points normalized by number of monitoring points); (2) location variance = log(σ²_lat + σ²_lon); (3) normalized location cluster entropy using DBSCAN clustering of stationary points: entropy = −∑_i t_i log t_i, normalized by log(k), where k is number of clusters and t_i is time proportion per cluster.
  • Audio-based sociability features from free-living audio using ECoNet pipeline: Voice Activity Detection (Pyannote/PyanNet) to segment speech; speaker embeddings via TDNN/x-vector model trained on large public speaker datasets; unsupervised clustering to assign speaker labels; random-forest-based filtering of spurious labels; multi-day speaker tracking. Extracted features: (1) conversational network size (average unique conversation partners/day); (2) top speakers’ speech ratio = t_top-2-speakers / t_total-speech (dyadic interaction proxy); (3) response time in dyadic interaction.

Modeling and evaluation:

  • Depression severity regression: Ridge regression (L2-regularized linear model) predicting PHQ-9 total score. Evaluated in leave-one-participant-out (LOPO) cross-validation. Regularization hyperparameter α selected via inner cross-validation on the training fold. Baseline: mean PHQ-9 of training set. Missing GPS-derived features imputed via k-nearest neighbors (k=3) within training folds. Feature set configurations: Model A (communication + location features only), Model B (audio features only), Model C (early fusion: all features), and Late Fusion (average of predictions from A and B). Metrics: RMSE and normalized RMSE (nRMSE = RMSE normalized by observed PHQ-9 range); Pearson correlation between predicted and reported PHQ-9.
  • Depression category classification: Five-class mapping of PHQ-9 (0–4 none, 5–9 mild, 10–14 moderate, 15–19 moderately severe, 20–27 severe). Classifier: Ordinal Logistic Regression (all-threshold variant), with comparison to multinomial Logistic Regression (L2). Baseline: always predict dominant class in training set. Metric: F1 score (weighted across classes). Binary detector also evaluated (depression present if PHQ-9 ≥ 5) with F1 metric.

Data and code availability: Features and analysis code available at https://github.com/lbishal/mobilesensing. Raw logs are not publicly available due to privacy.

Key Findings
  • Significant correlations with PHQ-9: From communication logs, text messaging network size showed significant associations: incoming text network size r = 0.47 (p = 0.014) and outgoing text network size r = −0.44 (p = 0.026). Skewness coefficient for combined call+text correlated with PHQ-9 (r = 0.40, p = 0.045). From mobility, normalized entropy correlated significantly (r = 0.51, p = 0.026). From audio, conversational network size correlated with PHQ-9 (r = 0.49, p = 0.014); response time showed a trend (r = 0.39, p = 0.062). Higher depression severity was associated with smaller conversational network size and smaller text-based network sizes.
  • Group-wise behaviors: Both reactive (incoming > outgoing) and proactive (outgoing > incoming) phone communication patterns were observed; psychosis group tended to be more proactive. Depression and psychosis groups exhibited smaller text-based and in-person conversational network sizes than healthy controls; psychosis group had comparable call network size to healthy controls.
  • Complementarity across modalities: Audio conversational network size correlated with location entropy but was largely independent of other communication/mobility features, suggesting complementary information.
  • Regression performance (LOPO CV): Baseline RMSE 9.63 (nRMSE 0.36). Model A (communication + location): RMSE 6.80 (nRMSE 0.25). Model B (audio only): RMSE 8.53 (nRMSE 0.32). Model C (all features, early fusion): RMSE 6.07 (nRMSE 0.22) with Pearson correlation r = 0.76 (p < 0.001) between predicted and reported PHQ-9. Late fusion: RMSE 6.92 (nRMSE 0.26).
  • Five-class classification (F1, weighted): Baseline 0.25; Model A 0.34; Model B 0.34; Model C 0.46. Ordinal logistic regression outperformed multinomial logistic regression.
  • Binary classification (PHQ-9 ≥ 5): F1 scores were 0.50 (Model A), 0.57 (Model B), and 0.66 (Model C); baseline 0.42. Overall, adding free-living audio sociability features improved both continuous depression severity prediction and categorical classification over communication + location features alone and substantially over baseline.
Discussion

Mobile sensing features captured behavioral signatures of depression across a heterogeneous, transdiagnostic cohort of healthy individuals and patients with depressive and psychotic disorders. Communication behaviors related to sociability—particularly text messaging network size and asymmetry/skewness in incoming versus outgoing interactions—were associated with PHQ-9 severity. Mobility variability (normalized location entropy) was also associated with depression severity, though prior large-scale work suggests potential challenges for generalization beyond specific cohorts. Free-living audio contributed unique sociability information, with conversational network size significantly related to depression and largely uncorrelated with most communication/mobility features, indicating complementarity. Incorporating audio-based sociability features into multimodal models yielded the best performance (RMSE 6.07; nRMSE 0.22; r = 0.76), improving over communication + location alone and baseline. The ordinal nature of depression severity categories was important for classification, where ordinal logistic regression outperformed standard multinomial approaches. Results align with earlier patient-independent modeling, demonstrating that mobile sensing can assess depression severity in heterogeneous groups and that audio can enhance performance. Future research should explore generalization across sites, longitudinal within-person dynamics, and richer feature learning to improve robustness and scalability.

Conclusion

Mobile sensing provides a promising objective method for depression severity assessment across heterogeneous mental health conditions. Sociability features derived from free-living audio complement communication log and location data, improving both continuous PHQ-9 prediction and category classification. This work advances scalable, cross-sectional screening and monitoring of depressive symptoms and supports multimodal sensing—especially inclusion of privacy-preserving audio—for future solutions. Further validation in larger, multi-site cohorts and longitudinal monitoring studies, along with exploration of representation learning and additional sensing modalities, is warranted to enhance generalizability and performance.

Limitations
  • Small sample size and single-site study limit generalizability; behavioral patterns may vary by region, culture, and setting.
  • Cross-sectional design precludes assessment of within-person changes over time; longitudinal studies are needed.
  • Potential confounders (e.g., personality, lifestyle, environment) can affect mobile sensing features; cross-sectional comparisons are susceptible to inter-individual variability.
  • Limited, hand-crafted feature set; no feature engineering or representation learning due to dataset size; end-to-end models with ordinal losses may improve performance in larger datasets.
  • GPS data were missing for some participants (imputed via KNN), which may introduce bias.
  • Continuous audio monitoring raises privacy and ethical concerns; although a privacy-preserving pipeline was used and acceptability was high in this cohort, broader acceptability and methods to derive informative features while protecting privacy require further study.
  • Prior evidence suggests some mobility features may not scale well to broad populations, emphasizing the need for multi-site validation and potentially invariant features.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny