Twitter offers a platform for mental health discussions, fostering community and support. However, it also facilitates the spread of stigmatizing attitudes, hindering help-seeking behavior. Reliable identification of these harmful tweets is crucial but challenging due to high tweet volume. Machine learning offers a potential solution for automated identification and mitigation. Previous research has used social media data to identify symptoms of depression through sentiment analysis, but these models can be biased by data collection methods and human biases in training data. This proof-of-principle study investigates whether machine learning, guided by service user input, can reliably identify stigmatizing tweets related to schizophrenia, a highly stigmatized condition on Twitter. The study prioritizes service user involvement to minimize bias and ensure model acceptability.
Literature Review
Existing research demonstrates the negative impact of stigma on mental health help-seeking. Machine learning models have shown promise in analyzing social media data for mental health indicators, particularly depression. However, these models can be susceptible to ascertainment bias and human biases. The crucial element is the integration of service user perspectives to reduce bias and increase model acceptance. This study builds upon prior work investigating stigma in other mental health conditions and aims to establish a more reliable and ethically sound approach.
Methodology
The study followed a service-user-centered machine learning pipeline, adhering to the Community Principles on Ethical Data Practices (CPEDP). This involved collaboration with a young person's mental health advisory group and service user researchers throughout the process. 13,313 tweets containing keywords related to schizophrenia were collected between January and May 2018. Two service user researchers manually coded 746 English tweets for stigma (kappa = 0.75). 80% of these coded tweets were used to train eight machine learning models (Random Forest, Gradient Boosting, K-nearest Neighbors, Naive Bayes, SVM with linear, sigmoid, and polynomial kernels). Model performance was evaluated based on the service user-defined metric of fewest false negatives, along with AUC and accuracy. The top two performing models underwent blind and unblind validation using additional service user codings. The best-performing model (linear SVM) was then applied to the entire corpus of English tweets (n=12,145) to assess stigma prevalence. Statistical analyses (t-tests, ANOVA) compared sentiment, subjectivity, and other features between stigmatising and non-stigmatising tweets, and examined geographic variations in stigma.
Key Findings
The inter-rater reliability for service user coding of tweets was high (kappa = 0.75). Stigmatizing tweets showed significantly more negative sentiment and subjectivity compared to non-stigmatizing tweets. The linear SVM model, selected based on service user feedback minimizing false negatives, achieved 91% accuracy. In validation tests, the SVM consistently outperformed the random forest in minimizing false negatives. Applying the SVM to the full dataset, 47% (n=5676) of English tweets were classified as stigmatising, exhibiting significantly more negative sentiment. Analysis across countries revealed variations in stigma prevalence and sentiment, with the USA showing a high proportion (47.6%) of stigmatising tweets and more negative sentiment than Canada and the UK.
Discussion
This study demonstrates the feasibility of using service user-supervised machine learning to identify schizophrenia-related stigma on Twitter. The high prevalence of stigma (47%) underscores the need for targeted interventions. The linear SVM model's performance, prioritizing the minimization of false negatives, reflects the service users' values and priorities. The study's findings highlight the potential of machine learning for large-scale monitoring of online stigma and evaluating the impact of anti-stigma campaigns in real-time. This approach is more efficient and robust than traditional methods like surveys, overcoming challenges such as low response rates.
Conclusion
This study successfully demonstrates a service-user-driven machine learning approach for identifying schizophrenia-related stigma on Twitter, revealing a surprisingly high prevalence of stigmatising content. The findings highlight the urgent need for online education and targeted campaigns, with machine learning offering a powerful tool for monitoring their effectiveness. Future research should focus on expanding the training dataset to improve model generalization, exploring different languages and cultural contexts, and refining the model to further minimize both false positives and false negatives.
Limitations
The study's limitations include the use of a relatively small training dataset, potentially affecting the model's generalizability. The model was trained on English tweets only, limiting its applicability to other languages. The reliance on the Twitter Streaming API might have resulted in an incomplete representation of all relevant tweets. Further research with larger and more diverse datasets is needed to address these limitations and improve the accuracy and generalizability of the model.
Related Publications
Explore these studies to deepen your understanding of the subject.