Blunted vocal affect (BvA) and alogia, characterized by reduced vocal prosody and verbal output, are transdiagnostic features of serious mental illness (SMI), impacting quality of life and functional outcomes. Current clinical assessment relies on subjective ratings during interviews. Computerized acoustic analysis offers a potential for objective, efficient 'digital phenotyping' of these symptoms, enabling automated assessment of symptom severity, treatment response, and relapse risk. However, prior studies using computerized acoustic analysis have shown only modest convergence with clinical ratings, potentially due to the use of small, constrained acoustic feature sets and a lack of consideration for the context of the speaking task. This study aimed to address these limitations by employing machine learning with a large feature set from two distinct speaking tasks (a 20-second picture task and a 60-second free-recall task) to model clinically rated BvA and alogia in a large sample of SMI patients. The study hypothesized that a large feature set and machine learning would lead to more accurate modeling and would reveal associations between BvA/alogia and demographic, diagnostic, and functional variables.
Literature Review
Numerous studies have explored the use of computerized acoustic analysis to measure BvA and alogia in schizophrenia and other SMI. However, these studies have generally yielded inconsistent and weak associations between acoustic measures and clinical ratings. Meta-analyses have reported large heterogeneity between studies and weak overall effects. These inconsistencies are likely due to methodological limitations, including the use of small and potentially non-comprehensive acoustic feature sets, and the lack of systematic consideration of the context of the speaking task. The current study aimed to address these limitations by utilizing a larger, more comprehensive feature set and accounting for the impact of speaking task using machine learning techniques.
Methodology
The study included 121 stable outpatients with SMI (schizophrenia, major depressive disorder, bipolar disorder, or other SMI) who participated in two speaking tasks: a 20-second picture description task and a 60-second free-recall task. Their speech was recorded and analyzed using two software programs: the Computerized Assessment of Affect from Natural Speech (CANS) and the Extended Geneva Minimalist Acoustic Parameter Set (GeMAPS). CANS provided 68 acoustic features related to speech production and variability, while GeMAPS provided 88 features. A total of 138 features were used in the analysis. Clinically rated BvA and alogia were assessed using the Scale for the Assessment of Negative Symptoms (SANS). Machine learning using Lasso regularized regression with 10-fold cross-validation was employed to model clinically rated BvA and alogia from the acoustic features. The accuracy of the models was evaluated, and the association between predicted scores (from ML models) and clinical ratings with demographic, diagnostic, symptom, and functioning variables was examined. Stability selection analysis was used to identify the most stable acoustic features predictive of BvA and alogia.
Key Findings
The study achieved high accuracy in predicting clinically rated BvA and alogia using machine learning (90% and 95% accuracy, respectively, in the training sets; similar accuracy in test sets). Accuracy improved when the speaking tasks were analyzed separately. Predicted BvA/alogia scores showed high convergence with clinical ratings (r = 0.73 for BvA and r = 0.57 for alogia). ML predicted scores were associated with poorer cognitive performance and social functioning and were significantly higher in individuals with schizophrenia compared to those with depression or mania. However, the acoustic features identified as most predictive of BvA/alogia were not those considered conceptually critical to their operational definitions. For example, features related to MFCC (Mel-Frequency Cepstral Coefficients), spectral, and formant frequencies were more prominent in the models than pause times and intonation variability, which are often considered central to clinical ratings.
Discussion
The findings demonstrate the feasibility of using machine learning and a comprehensive set of acoustic features to accurately model clinically rated BvA and alogia from brief speech samples. The high accuracy achieved suggests that digital phenotyping holds promise for objective, efficient assessment of these negative symptoms. The task-specific nature of the models highlights the importance of considering contextual factors in assessing BvA/alogia. The discrepancy between the most predictive acoustic features and those considered conceptually critical to the operational definitions of BvA/alogia warrants further investigation. Clinicians may be utilizing a broader range of subtle vocal cues than currently captured in operational definitions. These findings highlight the potential of digital phenotyping to improve the efficiency and sensitivity of negative symptom assessment, and provide important information for refining clinical assessment tools and advancing research on the etiology and treatment of negative symptoms.
Conclusion
This study provides strong evidence for the feasibility of digitally phenotyping BvA and alogia using machine learning of vocal acoustic features from relatively short speech samples. The high accuracy achieved, even with task-specific models, shows promise for efficient and objective clinical applications. Future research should focus on refining the operational definitions of BvA and alogia, exploring a wider range of speaking tasks, and investigating the role of medication and other factors.
Limitations
The study's sample size, particularly for the free speech task, was relatively small. The speech tasks were relatively constrained, potentially limiting the generalizability of the findings to more naturalistic settings. The study did not control for the effects of medication, which could influence vocal expression. Extreme levels of negative symptom severity were underrepresented in the sample.
Related Publications
Explore these studies to deepen your understanding of the subject.