logo
ResearchBunny Logo
Introduction
Language disturbance is a hallmark of psychosis, manifesting differently across the schizophrenia spectrum, from overt disorganization in severe cases to subtler variations in individuals with schizotypy or clinical high risk. Speech, as an observable reflection of thought, provides valuable insights into the underlying cognitive processes. Advancements in digital phenotyping and computerized natural language processing (NLP) offer opportunities to objectively quantify speech patterns in ecologically valid settings, potentially leading to improved assessment, treatment monitoring, and research into the psychosis disease process. However, the optimal application of NLP tools for identifying clinically meaningful linguistic parameters remains an area of ongoing investigation. Previous studies have employed various NLP techniques, such as Latent Semantic Analysis, graph analysis, and semantic density quantification, with varying success in characterizing language phenotypes in psychosis, primarily focusing on schizophrenia spectrum disorders (SSD). Some studies have successfully identified decreased linguistic cohesion and predicted psychosis onset, while others have yielded negative or inconsistent results, highlighting the complexities and challenges in this field. Many previous studies utilized older, non-contextual word embeddings (like GloVe and Word2Vec), unable to capture the contextual nuances of language. This study aims to address these limitations by employing state-of-the-art NLP techniques, specifically Bidirectional Encoder Representations from Transformers (BERT), to analyze speech samples from individuals with SSD and healthy controls across three levels of linguistic analysis (individual words, parts-of-speech, and sentence-level coherence). The researchers hypothesized that speech from individuals with SSD would exhibit abnormalities at each level of analysis and that NLP would outperform clinical rating scales in distinguishing between the two groups.
Literature Review
Existing literature on NLP applications in characterizing language in psychosis shows mixed results. While some studies successfully used Latent Semantic Analysis to quantify decreased coherence in SSD speech, predicting human ratings and discriminating between SSD and control participants with high accuracy (Elvevåg et al., 2007), others found no significant differences in linguistic cohesion between first-episode psychosis and healthy controls (Mackinley et al., 2020). Inconsistencies also emerged regarding the use of emotion words and the correlation between acoustic features and negative symptoms. Previous research often leveraged older NLP methods like GloVe and Word2Vec, which lack the contextual understanding provided by more advanced techniques. These older methods fail to differentiate between word meanings based on context, a critical limitation for nuanced linguistic analysis in clinical settings. Studies that did show success in applying NLP to the analysis of language in psychosis often used different NLP methodologies, different samples (e.g., varying in clinical presentation), and different outcome measures, making comparisons between them challenging. The inconsistencies in findings underscore the need for more sophisticated NLP approaches and a more standardized methodology to advance this research area.
Methodology
This study included two cohorts of participants: 20 with SSD and 11 healthy controls (HC). SSD participants were stable outpatients diagnosed with schizophrenia or schizoaffective disorder based on DSM-IV criteria. HC participants underwent the same diagnostic interviews and were free from major psychiatric disorders. Speech samples were collected through open-ended interviews; Cohort 1 discussed themselves, and Cohort 2 recounted positive and neutral memories. Recordings were transcribed verbatim, noting non-verbal vocalizations and disfluencies. A blinded psychiatrist rated the speech samples using the Scale for the Assessment of Thought, Language and Communication (TLC), assessing 18 items, a global language disorder score, and a total sum score. NLP analysis was conducted at three levels: 1. **Individual words:** Word usage patterns were compared between SSD and HC groups, considering pronouns, filler words, and incomplete words. 2. **Parts-of-speech (POS):** SpaCy was used to automatically tag POS for each word, and the counts of each POS category per 100 words were compared between groups. 3. **Sentence-level coherence:** BERT, a state-of-the-art embedding algorithm, was employed to assess sentence-level coherence. Two BERT-based methods were used: (a) next-sentence predictability, where BERT predicted the likelihood of consecutive sentences, and (b) sentence embedding distance, measuring the distance between interviewer prompts and participant responses to assess tangentiality. Statistical analyses included the Shapiro-Wilkes test for normality, the Wilcoxon Rank Sum test for non-parametric comparisons, ANCOVA models for POS comparisons, and linear models for BERT next-sentence probability. Naive Bayes models were trained to compare the discriminating ability of clinical measures (TLC) and NLP features. Leave-one-out cross-validation was used to evaluate model accuracy.
Key Findings
Clinical ratings of language disorder using the TLC showed no significant group differences between SSD and HC participants. However, NLP analyses revealed significant linguistic differences: * **Individual words:** SSD participants used significantly more first-person singular pronouns, the filler word "uh," and incomplete words, while HC participants used more first-person plural pronouns and the filler word "um." The frequency of incomplete words alone achieved an AUC of 0.88 and 90% accuracy in discriminating between groups. * **Parts-of-speech (POS):** SSD participants used significantly fewer adverbs, adjectives, and determiners but more pronouns than HC participants. These differences persisted even after excluding outliers. * **Sentence-level coherence:** BERT analysis showed that sentence embedding distances were significantly higher for SSD than HC participants, suggesting increased tangentiality in SSD. However, the next-sentence predictability analysis using BERT showed no significant group differences. Comparing model performance, the NLP-only model significantly outperformed the clinical-only model (87% accuracy vs. 68%), with comparable accuracy to the combined NLP and clinical model (81%). The NLP models continued to outperform the clinical model even when accounting for education level as a predictor.
Discussion
This study demonstrates the sensitivity of NLP methods in detecting sub-clinical linguistic differences in SSD that may not be captured by traditional clinical rating scales. The significant differences in pronoun usage, filler words, incomplete words, and sentence-level coherence identified by NLP suggest potential biomarkers for SSD. The superior performance of NLP models in discriminating between SSD and HC participants highlights the potential value of NLP in early detection and objective assessment of SSD. The findings align with previous research on increased pronoun use and decreased fluency in SSD, while also revealing novel insights into the use of filler words and incomplete words. However, the inconsistent findings regarding sentence-level coherence warrant further investigation, perhaps necessitating the exploration of alternative methods for analyzing discourse structure. The study's findings support the increasing recognition of NLP as a valuable tool in psychiatric research, offering objective and sensitive measures of linguistic features that may reflect underlying cognitive and neurobiological abnormalities.
Conclusion
This exploratory study successfully applied state-of-the-art NLP methods, particularly BERT, to characterize subtle linguistic differences between individuals with SSD and healthy controls. The findings suggest NLP's potential to provide clinically relevant and informative biomarkers for SSD, exceeding the sensitivity of traditional clinical ratings in detecting sub-clinical language disturbances. Future studies with larger, more diverse samples are needed to validate these findings and explore the clinical utility of these NLP measures in diagnostic and prognostic settings. Further research could also investigate the relationship between specific linguistic features identified by NLP and underlying cognitive and neurobiological mechanisms in SSD.
Limitations
The primary limitation of this study is the relatively small and heterogeneous sample size, potentially limiting the generalizability of the findings. The exploratory nature of the study also meant that multiple comparisons were not corrected, increasing the risk of Type I errors. The low number of participants with clinically evident thought disorder restricted the analysis of correlations between NLP and TLC measures. The reliance on transcribed speech, rather than live interaction, might also influence the results. Future studies should address these limitations by employing larger, more diverse samples, using standardized data collection and analysis procedures, and incorporating longitudinal data to examine the relationship between NLP measures and disease progression. Additionally, the exploration of alternative NLP methods and the investigation of the impact of medication effects on language abnormalities warrant further research.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny