Psychology

Relative importance of speech and voice features in the classification of schizophrenia and depression

M. Berardi, K. Brosch, et al.

This intriguing study conducted by Mark Berardi and colleagues explores how speech and voice features can differentiate between individuals with depression, schizophrenia, and healthy controls. Their findings reveal significant insights into the communication patterns associated with these mental health conditions, suggesting promising avenues for future research.

00:00

Playback language: English

Index

Introduction

Distinguishing between major depressive disorder (MDD) and schizophrenia spectrum disorder (SSD) remains a challenge. While transdiagnostic and multivariate approaches have been explored, reproducible biomarkers remain elusive. Recent research highlights the potential of speech features as objective, reproducible, and efficient biomarkers for these disorders. Speech production involves the complex coordination of numerous muscles and neurobiological processes, making acoustic analysis a valuable tool to detect abnormalities. Prior studies have identified atypical acoustic measurements in both MDD and SSD, including variations in prosody, voice quality, spectral features, and temporal aspects. Meta-analyses have reported effects such as decreased spoken time, reduced speech rate, and increased pause duration in individuals with schizophrenia, often correlating with clinical ratings. Similar findings, such as decreased speech rate and increased pauses, have been observed in MDD. While similarities exist, particularly concerning negative symptoms, potential distinctions arise due to differences in positive symptoms like formal thought disorder. Most previous research employed null-hypothesis significance testing (NHST), which has limitations in handling complex data. Machine learning (ML) approaches offer a more robust alternative, enabling analysis of the intricate relationship between speech patterns and psychiatric disorders. Previous ML studies have shown promising results in classifying patients with MDD and SSD from healthy controls, but these often utilized large, difficult-to-interpret feature sets. Interpretable machine learning (IML) combines the advantages of NHST with the computational power of ML, facilitating insight into the underlying mechanisms. Permutation feature importance is a model-agnostic approach to IML that helps identify the crucial features contributing to classification. This study uses IML to investigate speech acoustics as objective classifiers for depression and schizophrenia, aiming to identify important features and assess their correlation with symptom severity.

Literature Review

Previous research has explored speech acoustics as potential biomarkers for MDD and SSD, with studies reporting atypical acoustic measurements such as changes in prosody, voice quality, spectral features, and temporal aspects in both disorders. Meta-analyses have revealed consistent findings, such as decreased speech rate and increased pause duration in both MDD and SSD, although distinctions exist due to differences in positive symptoms. Most previous work employed null-hypothesis significance testing, with limitations in handling complex data. Machine learning (ML) approaches offer a promising alternative, with studies demonstrating accuracy in classifying MDD and SSD from healthy controls. However, previous ML applications often involved numerous abstract features making interpretation challenging. Interpretable machine learning (IML), employing methods like permutation feature importance, promises to provide insights into the relevant speech features and their relationships with symptom severity. The present study builds on this foundation, aiming for greater interpretability and a more nuanced understanding of the relationship between speech features and specific psychiatric symptoms.

Methodology

Participants (20 with SSD, 20 with MDD, and 20 healthy controls (HC)) were selected from the Marburg/Münster Affective Disorders Cohort Study. Age and sex matching was employed between groups. Exclusion criteria included substance abuse, traumatic brain injury, neurological diseases, and low verbal IQ. Diagnoses were assessed using the German version of the Structured Clinical Interview for DSM-IV (SKID-I) and psychopathological scales. A picture description task based on the Thematic Apperception Test (TAT) was used to elicit spontaneous speech, with four speech samples per participant. Speech samples were segmented, and examiner speech and excessive noise were manually removed. Feature extraction involved calculating speech tempo, pause, prosodic intonation, prosodic stress, and speech spectrum features, as well as additional features like pauses per minute (PPM) and articulation coordination features (ACFs). Vocal quality features (based on cepstral peak prominence (CPPs) and low-to-high ratio (LHR)) were also included. Three pairwise classification models (HC vs. SSD, HC vs. MDD, SSD vs. MDD) were used, employing Support Vector Machines (SVMs) with three polynomial kernels (linear, 2-degree, and 3-degree). Five-fold cross-validation was applied, with Bayesian hyperparameter optimization. Permutation feature importance was calculated to determine the relative importance of each feature in classification. Statistical relationships between the top 25% most important features and the three groups were determined using ANOVA (or Mann-Whitney U tests if assumptions were not met). Pearson correlations were calculated between these features and symptom severity scores (HAM-D, SANS, SAPS, and subscales).

Key Findings

The SVM models with 2-degree and 3-degree polynomial kernels demonstrated high accuracy in classifying the three groups (accuracy >0.90). The most important features across all models, and their importance relative to the 3-degree polynomial SVM, included articulation coordination features (ACF2, ACF1), intensity kurtosis, MFCC1, PPM, CPPs skewness, f0 SD, LHR SD, and LHR. Several features showed statistically significant differences between the patient groups and HC. Specifically, ACF2, ACF1, MFCC1, PPM, talking rate, and CPPs SD were significantly different in both MDD and SSD compared to HC. Intensity kurtosis, CPPs skewness, and LHR were significantly different in MDD compared to HC, while fo SD and LHR SD were significantly different in SSD compared to HC. Moderate correlations were observed between some features and symptom severity scores in SSD: LHR SD with HAM-D and SAPS; CPPs skewness with SANS and SAPS FTD; intensity kurtosis with SAPS; and MFCC1 and PPM with SAPS FTD. The features generally reflect aspects of articulation coordination, speech variability, and the number of pauses. These findings suggest these aspects differ between HC and the clinical groups.

Discussion

The high classification accuracy of the SVM models supports the hypothesis that speech features can effectively discriminate between healthy controls and patients with MDD and SSD. The identified important features—articulation coordination, speech variability, and pause frequency—reflect aspects of psychomotor slowing, alogia, and flat affect, all related to the core symptoms of these disorders. The moderate correlations between features and symptom severity scores in SSD further substantiate the link between speech patterns and symptom expression. The inclusion of vocal quality features, such as those derived from CPPs and LHR, offers a novel approach with potential for improving the classification accuracy and understanding of voice quality changes in these disorders. These findings have implications for developing objective, easily obtainable biomarkers for MDD and SSD. Future research can expand upon these findings using multimodal data and more refined symptom measures to further refine these biomarkers.

Conclusion

This study demonstrates the potential of speech and voice features as objective biomarkers for MDD and SSD. Interpretable machine learning models achieved high accuracy in classifying these disorders. Key features reflecting articulation coordination, speech variability, and pause frequency showed distinct differences across groups and moderate correlations with symptom severity. Future research should focus on multi-class classification, symptom severity, subtypes, and multimodal approaches for more refined diagnostic tools and monitoring of symptom changes.

Limitations

The study's limitations include the relatively small sample size, potential confounders such as education level, and the use of a single speech task and language. The cross-sectional design and the potential impact of medication also limit the generalizability of findings. A larger, more diverse sample with multiple speech tasks and longitudinal data are needed for confirmation and broader application.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Effects of COVID-19-related stress and fear on depression in schizophrenia patients and the general population

Y. Lee, Y. Chung, et al.

Environmental Studies and Forestry

Biofilm thickness controls the relative importance of stochastic and deterministic processes in microbial community assembly in moving bed biofilm reactors

S. J. Fowler, E. Torresi, et al.

Psychology

Frequency of depression and correlates among Chinese children and adolescents living in poor areas under the background of targeted poverty alleviation: results of a survey in Weining county

X. Chen, X. Yuan, et al.

Medicine and Health

Quantifying the relative importance of genetics and environment on the comorbidity between mental and cardiometabolic disorders using 17 million Scandinavians

J. Meijsen, K. Hu, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny