Introduction
Mental health disorders are a significant public health concern, particularly among young people. Many youth do not access care due to stigma, barriers, and lack of awareness. Chat-based counseling hotlines offer a low-threshold intervention, providing immediate support and potentially serving as an entry point to further treatment. However, identifying individuals needing additional help is crucial for efficient resource allocation. This study explores the feasibility of using NLP to predict recurrent chat contact – a behavioral marker indicating unmet needs – after an initial consultation. The large text corpora generated by these services offer a rich source of information that NLP can potentially analyze to improve care pathways. While NLP has shown promise in other mental health settings (e.g., predicting readmission, identifying suicidal behavior), its application to chat-based counseling remains under-explored, especially regarding the prediction of recurrent contact as an indicator of further help required. This research aims to address this gap by leveraging a substantial dataset from a German 24/7 chat counseling service to build a predictive model and explore the factors associated with recontact.
Literature Review
Existing research on NLP applications in mental healthcare demonstrates its potential in various areas, including predicting readmission in patients with major depressive disorder, detecting suicidal behavior, identifying symptom information, and patient stratification. Studies have also employed non-clinical data, such as social media posts, to monitor mental health. However, the application of NLP specifically within chat-based counseling services is limited. While some studies have explored using NLP to identify effective counseling strategies or classify distress levels, these often involve smaller datasets and may not fully address the prediction of recurrent contact as a key indicator of unmet needs and the subsequent provision of more appropriate and effective support. The present study aims to expand this research by focusing on the prediction of recurrent contact as a pivotal element of a stepped-care approach for youth mental health.
Methodology
This preregistered study (OSF: XA4PN) utilized anonymized data from the German 24/7 chat counseling service "krisenchat." The dataset comprised approximately 813,000 messages from 18,871 unique chatters who received a first consultation between October 2021 and December 2022. The data underwent rigorous anonymization to protect user privacy, including the random ordering of words within each conversation. The study employed a time-based split, dividing the data into a training set (14,929 consultations) and a test set (3942 consultations). An XGBoost classifier was trained using a vectorized word approach, with hyperparameters optimized via repeated cross-validation and Bayesian optimization. The model's performance was evaluated using the AUROC score on the unseen test set. SHAP values were employed to interpret the model's predictions, identifying important predictors associated with recurrent contact. A clustering approach, using a pre-trained Word2Vec algorithm, was used to group similar word stems for improved interpretation of the model’s outputs. A baseline model using word stem counts and time of contact was also developed for comparison. The ethical considerations for data privacy were meticulously managed throughout the study.
Key Findings
The best-performing XGBoost model achieved an AUROC score of 0.68 (p < 0.01) on the unseen test set, significantly exceeding chance. Using a default threshold of 0.5, the model achieved a balanced accuracy of 0.62 and an accuracy of 0.65. The precision was 0.62, and the recall was 0.44. SHAP value analysis revealed that words indicating younger age (e.g., "12," "13") and female gender, along with terms related to self-harm ("verletzt"), suicidal thoughts ("sterben," "suizid," "selbstmord"), and time (e.g., "tagsüber," "nacht"), were associated with a higher probability of recontact. In contrast, words related to work and employment ("job," "arbeite") were associated with a lower probability. The analysis also highlighted the predictive value of counselor messages, with words like "professionell," "internetseelsorg," and "rat" predicting a lower likelihood of recontact, possibly reflecting successful redirection to other care pathways. Further analysis of word co-occurrence showed that counselor use of words like "suicide" often coincided with chatter mentions of "thoughts," indicating a potential reframing of the conversation. The overall predictive signal was widely distributed, indicating a complex interplay of factors influencing recontact.
Discussion
The study's findings demonstrate the feasibility of using NLP to predict recurrent chat contact in a real-world youth mental health setting. The achieved predictive performance, while not perfect, is comparable to or surpasses results in other studies that used NLP to predict outcomes in digital interventions. The significant predictive performance of the model, even after rigorous anonymization, suggests that language itself contains valuable information for identifying individuals who might benefit from additional support. The identified predictors—age, gender, self-harm, and suicidal ideation—align with clinical expectations, lending credibility to the model. However, it's crucial to note that the model does not directly measure symptom severity and that gender differences in recontact might be affected by factors beyond unmet need, such as service satisfaction or the difficulty of matching chatters with the same counselor in subsequent interactions.
Conclusion
This study provides evidence for the potential of NLP to improve personalized care in chat-based mental health services for youth. The ability to predict recurrent contact allows for proactive interventions, such as redirection to more intensive care or tailored support. Future research should focus on refining the model by incorporating additional features (e.g., non-textual data), exploring different NLP models (e.g., transformer-based models), comparing the algorithm's performance to human clinicians, and evaluating the impact of prediction-based interventions on clinical outcomes. Qualitative analysis to better understand the reasons for recontact would also enrich the understanding of how to best utilize these predictions to support individuals needing further help.
Limitations
Several limitations should be considered. First, counselors were not explicitly instructed to limit consultations per chatter, potentially influencing recontact. Second, gender differences in recontact may reflect factors besides unmet need. Third, the outcome measure might not capture all instances of unmet need due to possible user dissatisfaction. Fourth, the outcome is not strongly associated with absolute symptom severity. Fifth, data anonymization limited the ability to fully interpret the model due to the randomized order of words within conversations. Finally, inherent biases in the Word2Vec model used for clustering might affect the interpretation of the results and the time-based train-test split could be influenced by seasonal factors.
Related Publications
Explore these studies to deepen your understanding of the subject.