Medicine and Health

Natural language processing of multi-hospital electronic health records for public health surveillance of suicidality

R. Bey, A. Cohen, et al.

This study by Romain Bey, Ariel Cohen, and colleagues unveils concerning trends in suicidality among youth by analyzing over 2.9 million electronic health records from Parisian hospitals. It highlights a significant post-pandemic rise in suicide attempts, particularly among adolescent girls, calling for heightened awareness and action regarding mental health.

00:00

Playback language: English

Index

Introduction

The COVID-19 pandemic heightened concerns about mental health, particularly the risk of increased suicide attempts (SA). While evidence confirming this concern emerged, initial studies produced inconclusive or contradictory results. The most affected groups were later identified as youths, especially girls. The time lag in data collection hindered timely interventions. This study explores the use of natural language processing (NLP) algorithms to analyze textual data from electronic health records (EHRs) for timely mental health monitoring. Previous research using social media data faced limitations due to uneven population representation and limited outcome reporting. Analyzing hospital EHR data offers a more comprehensive and representative approach, but challenges remain regarding algorithm generalizability across diverse care contexts and the complexities of multi-hospital data aggregation. This study aimed to demonstrate the feasibility of computing population-level mental health indicators by jointly analyzing millions of clinical reports from multiple hospitals, focusing on suicidality during the COVID-19 crisis.

Literature Review

Existing literature revealed concerns about the impact of the COVID-19 pandemic on mental health and suicide attempts, particularly among young people, specifically girls. However, initial studies yielded inconsistent results. Many studies used social media data, which presented limitations like uneven population representation and difficulty in drawing concrete clinical and policy guidelines. Studies using NLP on clinical data were often limited to single hospitals or lacked generalizability. The need for a more robust, population-level approach using multi-hospital EHR data and advanced NLP techniques was evident.

Methodology

This multicenter observational retrospective cohort study analyzed 2,911,920 hospitalizations from 15 Parisian hospitals between August 1, 2017, and June 31, 2022. A hybrid NLP algorithm (combining machine learning and rule-based approaches) was developed and validated to identify hospitalizations caused by suicide attempts (SA) from clinical reports. The algorithm's positive predictive value (PPV) was superior to a rule-based alternative (0.85 vs. 0.51). An interrupted time-series analysis was used to assess changes in the monthly number of SA hospitalizations before and after the COVID-19 outbreak (March 1, 2020). The analysis was stratified by sex and age group (8–17, 18–25, 26–65, 66+). The study also explored the prevalence of known SA risk factors (history of SA, physical, sexual, and domestic violence, social isolation) mentioned in clinical reports. Sensitivity analyses were performed to assess the robustness of the results. Statistical analysis included ordinary least-squares regressions, log-rank tests, and Fisher's exact tests. The study adhered to the RECORD reporting guideline and received ethical approval.

Key Findings

The analysis included 14,023 SA-related hospitalizations (0.5% of total hospitalizations). Females accounted for two-thirds of these hospitalizations. A statistically significant increase in the monthly number of SA hospitalizations was observed after the COVID-19 outbreak (overall trend variation: 3.7, 95% CI 2.1–5.3). This increase was primarily driven by girls aged 8–17 (trend variation: 1.8, 95% CI 1.2–2.5) and young women aged 18–25. Sensitivity analyses confirmed the robustness of these findings. The study also revealed a significant post-pandemic increase in the prevalence of reported violence (domestic, physical, sexual) in discharge summaries of SA-caused stays, particularly among females, while the prevalence of social isolation and history of SA showed only marginal changes.

Discussion

This study successfully demonstrated the feasibility of using NLP to analyze multi-hospital EHR data to create timely indicators for public health surveillance of suicidality. The significant post-pandemic increase in SA hospitalizations, particularly among young females, aligns with previous findings and highlights the substantial impact of the COVID-19 pandemic on women's mental health. The increased reporting of violence underscores its crucial role in this phenomenon. While a direct causal link between lockdown and violence wasn't established, the results emphasize the critical importance of women's safety and the exacerbation of SA in the pandemic's aftermath. The study's methodology allows for early detection of at-risk groups during crises, facilitating timely interventions.

Conclusion

This study demonstrates that analyzing structured and unstructured data from multiple hospital EHRs using NLP allows for the computation of population-level mental health indicators, providing valuable insights for public health surveillance. The findings highlight the disproportionate impact of the COVID-19 pandemic on young women's mental health and the critical role of violence in SA. This methodology can aid in timely identification of at-risk groups and inform prevention strategies. Future research should focus on prospective studies to further validate these indicators and explore additional factors contributing to suicidality.

Limitations

This study is observational and cannot establish causality. The retrospective design should be complemented by prospective research. Reporting of SA and risk factors by clinicians may be influenced by various factors (clinical practices, physician experience, EHR usability). The study focused on hospitalizations, excluding emergency department visits without subsequent hospitalization. This limits the analysis to more severe cases and the exploration of the full spectrum of SA severity.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

Generation and evaluation of artificial mental health records for Natural Language Processing

J. Ive, N. Viani, et al.

Medicine and Health

Natural language processing system for rapid detection and intervention of mental health crisis chat messages

A. Swaminathan, I. López, et al.

Medicine and Health

Cohort design and natural language processing to reduce bias in electronic health records research

S. Khurshid, C. Reeder, et al.

Medicine and Health

Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: A Comparative Analysis and Validation Study

H. Chen, M. Alfred, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny