logo
ResearchBunny Logo
Introduction
Psychotic disorders, characterized by formal thought disorder (FTD) manifesting as disorganized speech, typically emerge in late adolescence or early adulthood, often preceded by a clinical high-risk (CHR-P) phase. While clinical, cognitive, neuroimaging, and blood markers are associated with psychosis transition in CHR-P individuals, the need for accurate, non-invasive, and easily translatable predictive tools remains. Such tools could facilitate preventative interventions. Recent research has explored automated methods to quantify speech disorganization using Natural Language Processing (NLP). These methods offer scalability and objectivity compared to traditional qualitative assessments. Prior studies have employed NLP techniques like Latent Semantic Analysis (LSA) to assess semantic coherence, tangentiality (off-topic speech), and semantic similarity. Other approaches quantify referential cohesion and employ graph theoretical methods to represent speech structure. These automated approaches have shown promise in distinguishing psychosis cases from controls and even predicting psychosis onset in CHR-P individuals. However, optimal speech assessment strategies (e.g., stimulated vs. free speech) and the most informative NLP measures remain unclear. This study aimed to address these gaps by investigating the ability of twelve NLP measures to distinguish between CHR-P, first-episode psychosis (FEP), and healthy control subjects using different speech elicitation tasks (Thematic Apperception Test [TAT], Discourse Comprehension Test [DCT], and free speech). The researchers hypothesized that a combination of NLP measures would be more effective than individual measures in identifying patterns associated with psychosis.
Literature Review
The literature review section extensively cites previous research on NLP applications in psychosis diagnosis and prediction. Several studies are highlighted demonstrating the use of LSA to measure semantic coherence, tangentiality, and other aspects of disorganized speech. Studies using word embeddings and sentence embeddings, alongside graph theoretical approaches, are also mentioned as alternative methods for quantifying speech disorganization. The review summarizes findings from previous work showing the ability of these automated NLP methods to discriminate between individuals with psychotic disorders and healthy controls, as well as their potential in predicting future psychosis in at-risk individuals. The authors point out that the existing literature employs a limited set of NLP measures and raises questions about the optimal methods for assessing speech, particularly the choice between stimulated and free speech paradigms.
Methodology
The study involved three groups: 25 CHR-P participants, 16 FEP patients, and 13 healthy controls, matched for age and sex. CHR-P participants were recruited from the OASIS service and met ultra-high-risk criteria using the CAARMS. FEP patients were recruited from the South London and Maudsley NHS Foundation Trust, and healthy controls had no psychiatric history. Participants provided written informed consent, and ethical approval was obtained. CHR-P subjects were followed for an average of 7 years to assess psychosis transition (defined as the onset of persistent psychotic symptoms). The primary analysis focused on transcribed speech generated using the TAT; participants described eight pictures for one minute each. Prompts were used to encourage continued speaking. Speech was transcribed by a trained assessor blind to group status. The same procedure was repeated using the DCT (story retelling) and free speech tasks. The Thought and Language Index (TLI), PANSS, WRAT IQ, Wechsler Adult Intelligence Scale Digit Span test, and years of education were also collected. Twelve NLP measures were calculated for each speech excerpt: total words (Nword), total sentences (Nsent), mean words per sentence (Nword/Nsent), semantic coherence (using word2vec and SIF embeddings), tangentiality, on-topic score, repetition (maximum cosine similarity between sentences), ambiguous pronouns, and four speech graph connectivity measures (LCC, LSC, LCCr, LSCr). Group differences were analyzed using Mann-Whitney U-tests (for non-normal data). Linear regression was used to assess relationships between NLP measures, controlling for group membership. GAMLSS models were employed to assess the robustness of group differences when controlling for IQ, education, and digit span. The effect of medication and number of prompts were also investigated.
Key Findings
Analysis of TAT speech revealed significant group differences across several NLP measures. FEP patients exhibited significantly lower semantic coherence, shorter sentences, and higher numbers of sentences compared to controls. The 'on-topic' score showed larger group differences than tangentiality, indicating that FEP patients' responses were less related to the picture descriptions on average. Speech graph connectivity was reduced in FEP patients. CHR-P subjects showed reduced semantic coherence and on-topic scores compared to controls. When controlling for IQ, LSC and LSCr differed significantly between CHR-P subjects who transitioned to psychosis and those who did not. The NLP measures showed weak correlations, suggesting complementary information. Strong correlations were found among the speech graph measures. The LSC measure negatively correlated with maximum similarity (repetition) and positively correlated with on-topic score. There was no significant association between speech graph measures and semantic coherence. Significant associations were observed between the TLI negative score and several NLP measures (words, LCC, LCCr, LSC, LSCr) after correcting for multiple comparisons. After controlling for IQ or years of education, some group differences were no longer significant, particularly relating to LSCr, reflecting its association with IQ and education levels reported in prior research. Analysis of DCT and free speech data replicated some findings but showed weaker group differences, suggesting the TAT and DCT are more sensitive methods for assessing thought disorder. The number of prompts administered differed significantly across groups, with FEP patients receiving the most.
Discussion
The findings demonstrate that several NLP measures can effectively differentiate between healthy controls, CHR-P, and FEP individuals. The robustness of some findings (e.g., semantic coherence, sentence length, on-topic score differences between FEP and controls) after controlling for IQ and education suggests their potential clinical value. The weak correlations between most NLP measures indicate that they provide complementary information and support the use of multiple measures for more comprehensive assessments of thought disorder. The sensitivity of the TAT and DCT compared to free speech for detecting thought disorder underscores the importance of carefully choosing the speech elicitation method. Future work should involve larger sample sizes and focus on individual-level prediction using machine learning approaches to test the clinical utility of these measures.
Conclusion
This study provides evidence that automated NLP analysis of speech is a promising approach for the assessment and prediction of psychosis. The findings highlight several NLP measures that distinguish between different psychosis risk groups and showcase the importance of carefully selecting speech elicitation tasks. Future research with larger datasets and machine learning approaches is crucial to determine the full clinical applicability of these methods for individual-level risk prediction and early intervention.
Limitations
The study's limitations include the relatively small sample size, potentially leading to type II errors, and the need for replication in larger cohorts to enhance generalizability. The study focused on FEP patients and did not include individuals with chronic psychosis, limiting the comparison of acute and chronic FTD. The presence of group differences in confounding variables (medication, IQ, education, digit span, prompts) requires careful consideration. While controlling for some of these variables did not alter the main findings qualitatively, it affected the statistical significance of some results. Future studies should investigate these relationships further and examine whether the additional predictive power offered by automated language markers extends beyond established cognitive measures. The lack of randomization in task presentation order and the significantly higher incidence of inaudible speech in free speech recordings compared to the stimulated speech tasks also present limitations.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny